Open questsul opened 2 weeks ago
How to reproduce: For pyicberg I was using metadata file stored in Azure blob storage
static_table = StaticTable.from_metadata(
"abfs://path/metadata/example.metadata.json",
properties={
"adlfs.connection-string": "ADD THIS",
},
)
Here is java snippet I used for verification:
package com.example;
import org.apache.iceberg.StaticTableOperations;
import org.apache.iceberg.TableMetadata;
import org.apache.iceberg.inmemory.InMemoryFileIO;
import org.apache.iceberg.io.FileIO;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class StaticTableExample {
public static void main(String[] args) throws IOException {
StaticTableExample ex = new StaticTableExample();
ex.start();
}
public void start() throws IOException {
InMemoryFileIO fileIO = new InMemoryFileIO();
String metadataFilePath = "example.metadata.json";
fileIO.addFile(metadataFilePath, readFileAsBytesNIO(metadataFilePath));
StaticTableOperations ops = new StaticTableOperations(
metadataFilePath,
fileIO
);
TableMetadata meta = ops.current();
System.out.println("Table location: " + meta.location());
}
public byte[] readFileAsBytesNIO(String filePath) throws IOException {
Path path = Path.of(filePath);
return Files.readAllBytes(path);
}
}
I'm not entirely sure if I'm looking at the correct code, but it seems that in Java, the operation field might be optional during parsing. For example, it appears that operation can be set to null:
The metadata.json file I provided was produced by Snowflake. After Snowflake made some updates to their Iceberg implementation, they began creating metadata files in this format. Previously, there were no issues reading Snowflake Iceberg tables using PyIceberg.
Hi @questsul - thank you for raising this issue, and for providing this analysis. The optional and required attributes in PyIceberg are based on the nullability of the objects as they are defined within the Rest Catalog Open API spec. Here, the operation field in summary is labeled to be a required attribute:
I think there's a few takeaways here based on our findings:
Java is interestingly more graceful in parsing the operation tag (and it probably should not be)
@sungwy does this mean that the current JAVA implementation does not adhere to the spec? If so, we should open a ticket to track
Apache Iceberg version
0.6.0
Please describe the bug 🐞
When attempting to read the metadata.json file, which contains a list of snapshots where some snapshot summaries lack the
operation
field, PyIceberg encounters the following error:TypeError: Summary.init() missing 1 required positional argument: 'operation'.
Interestingly, when parsing the same metadata file using the Iceberg Java library, it works without any issues.
Full stack trace:
Metadata.json example: