apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.21k stars 2.17k forks source link

Catalogs Do Not Easily Support Full State Rollback #1944

Closed johnclara closed 6 months ago

johnclara commented 3 years ago

One of our table's TableMetadata was referencing a missing ManifestList somehow. We're still root causing how the corrupt state occurred.

Because of the corrupt state, we thought we shouldn't use the normal Table level rollback. It looks like it will commit snapshots and that will keep the corrupt snapshot in snapshot history. (We're not 100% sure on the code path for this). https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/Rollback.java

Since ManifestLists are lazily evaluated, our control plane was able to continue making property updates/purge snapshots. This meant the MetadataLog grew past 100 (default retention size). https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableMetadata.java#L165

We recursively looped back through TableMetadata files in the TableMetadata's MetadataLog until we found the entry which matches the first corrupt TableMetadata's path.

Then we updated the Metastore to reference the path of the MetadataLog entry immediately before the corrupt MetadataLog entry. Afterwards, we reset upstream state (kafka offsets) to before this entry's timestamp.

Should catalogs support this type of operation? Or should Iceberg assume state will never get corrupted and only support auditable rollback.

Note: Quirks about this table: It's ingested in append only with FastAppend. It will get around 10k new snapshots per day. Our control plane periodically truncates it to the last 10k (we're still working on rolling out ManifestList truncation). It's consumed by spark jobs using custom snapshot range scans, planned using the snapshot history log instead of recursively hopping up through parent snapshots. Our control plane also uses the snapshot history log in the same way for state tracking and to chunk snapshot ranges for scheduled consumer jobs.

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] commented 6 months ago

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'