apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.49k stars 2.24k forks source link

Including Iceberg Version in metadata json file for better traceability of PendingUpdate #11471

Open rice668 opened 2 weeks ago

rice668 commented 2 weeks ago

Feature Request / Improvement

When a new feature is released in Iceberg, engines (Spark, Trino, Presto, Flink) and some clients need to upgrade their Iceberg versions accordingly. However, there are cases where individual engines may forget to upgrade, leading to unexpected behavior. We propose including the iceberg version in the metadata JSON file when writing out updates. This would allow for quick identification of which engine or clients wrote the current PendingUpdate, making it much easier to troubleshoot issues.

Query engine

Other

Willingness to contribute

nastra commented 2 weeks ago

@rice668 the iceberg version should already be included in the summary of a particular snapshot.

rice668 commented 2 weeks ago

Thanks @nastra ! If it is only recorded in Snapshot, it is not very convenient to troubleshoot the problem. What we need is a PendingUpdate, not just a SnapshotUpdate. It is better to write the version information to the file name, which is much more convenient.

rice668 commented 2 weeks ago

Answer my own question, can use IcebergBuild#fullVersion did it.

FYI: https://github.com/apache/iceberg/pull/5236