apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.22k stars 2.17k forks source link

Does main branch reference reset requiring a clean up of snapshot logs #11109

Open haizhou-zhao opened 2 weeks ago

haizhou-zhao commented 2 weeks ago

Query engine

Spark

Question

Background

This is another Hive/Hadoop and REST Catalog behavior discrepancies discovered while enabling integration test on REST catalog. The assumption here is all the existing Spark integration tests should pass as-is on REST Catalog just like how they would pass using Hive & Hadoop Catalog, because conceptually Spark expects the same behavior out of Iceberg, no matter what catalog types are used. Reference issue: https://github.com/apache/iceberg/issues/11079

What

https://github.com/apache/iceberg/blob/main/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java#L813

This integration test passes on Hive & Hadoop Catalog, but does not pass on REST Catalog reference implementation (RESTCatalogAdapter over JdbcCatalog).

Why

Root Cause is REST Catalog reference implementation, comparing to Hive & Hadoop Catalog, will run extra logic on server side when responding to a CREATE OR REPLACE ${table} spark command. A CREATE OR REPLACE ${table} command will trigger a RemoveSnapshotRef (implements MetadataUpdate) change to be sent (within UpdateTableRequest) to REST Catalog server side. When the reference implementation server process this change, it will run removeRef method when building the replacement metadata, within which, snapshot log is cleared: https://github.com/apache/iceberg/blob/113c6e7/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1250. However, Hive and Hadoop Catalog does not run that method when building replacement metadata. At the end of the day, running CREATE OR REPLACE ${table} on top of REST Catalog will result in failure of validating snapshot log is kept intact after CREATE OR REPLACE command.

Questions

  1. Should snapshot history be kept intact after table replacement call based on spec definition, no matter the catalog type used?
  2. On table replacement, If snapshot-log should get cleared, then does snapshots list itself need to be cleared as well?
  3. Should the answer to 2 questions above vary based on catalog types (i.e. they are catalog implementation details, not spec issue) - that Hive & Hadoop catalog are doing the right thing to not clear snapshot-log or snapshots on table replacement; and that since REST catalog implementation is not controlled by this repo, it can choose its implementation details freely (whether clear or not clear snapshot-log & snapshots on replacement). In this case, should we change the REST Catalog reference implementation (RESTCatalogAdapter on JdbcCatalog) so that its behavior is the same as Hive & Hadoop Catalog (where snapshot history is not cleared on replacement)? Or should we change the integration test so that we don't check snapshot history being intact after table replacement?
haizhou-zhao commented 1 week ago

Down to code level detail (root cause of why REST differs from Hadoop/Hive), these two methods, seemingly doing similar things when resetting main branch, but one would clear snapshotLog history, one would not:

TableMetadata$Builder::removeRef will clear snapshot logs on main branch reset: https://github.com/apache/iceberg/blob/f71c7df/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1256

TableMetadata$Builder::resetMainBranch will not clear snapshot logs on main branch reset: https://github.com/apache/iceberg/blob/f71c7df/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1270

Hadoop/Hive catalog only executes TableMetadata$Builder::resetMainBranch on replace table calls, while REST catalog (if server depends on TableMetadata class provided by Open Source release, like reference implementation of RESTAdapter over JdbcCatalog) will run TableMetadata$Builder::removeRef in addition on replace table calls.