apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.39k stars 2.21k forks source link

What's the use of old metadata file, why not delete by default? #11206

Open madeirak opened 1 month ago

madeirak commented 1 month ago

Query engine

Spark HiveCatalog

Question

Every metadata file store the full amount of snapshot at that time, so why not delete metadata file by default? User must manually write.metadata.delete-after-commit.enabled and write.metadata.previous-versions-max image

eric-maynard commented 4 weeks ago

Keeping old metadata helps support rollback & time travel. It's often useful to know what the state of the table was at a certain point of time, or to be able to run a query against the table as it was at some point in time.

madeirak commented 4 weeks ago

Keeping old metadata helps support rollback & time travel. It's often useful to know what the state of the table was at a certain point of time, or to be able to run a query against the table as it was at some point in time.

Thx for reply, what if I have already deleted old data by expiring old snapshots, why is old metadata file ending in ".json" still retained? Meanwhile, every new metadata file ending in ".json" contains full amount of snapshot information, rather than incremental information, why not delete old metadata file automatically?