apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.47k stars 2.24k forks source link

AWS Glue Apache Iceberg Data Recovery #11077

Open SamRaza356 opened 2 months ago

SamRaza356 commented 2 months ago

Query engine

AWS ATHENA

Question

Done full migration iceberg table into another isolated table. Issue: Deletion done in (first) iceberg table does'nt reflects on iceberg table although data & metadata is fully copied and pointer is indicating latest metadata_location.

image image image
CaptRick commented 2 months ago

Hi @SamRaza356,

I see the issue you're facing with the deletion not being reflected in the migrated Iceberg table despite having fully copied the data and metadata.

One potential reason for this could be related to how Iceberg manages snapshots and metadata pointers. Even though the metadata_location is updated, the new table might still be referencing an older snapshot that doesn't include the deletions.

To resolve this, you could try the following steps:

Check Snapshots: Verify that the latest snapshot in the new table includes the deletions. You can use the list_snapshots method or query the metadata table to confirm this.

Force a Table Refresh: Try refreshing the table to ensure that it picks up the latest metadata. In AWS Glue or Athena, you can do this by running the ALTER TABLE ... REFRESH command, or by reinitializing the table in your query engine.

Recheck Metadata Location: Double-check that the metadata_location truly points to the correct and most recent metadata file. Any discrepancies here could cause the issue you're seeing.

If these steps don't resolve the problem, it might be useful to look into any potential caching mechanisms or inconsistencies between the source and target environments.

Let me know if you need further assistance, and I'd be happy to help!