Open danielhumanmod opened 3 weeks ago
Hi @the-other-tim-brown, based on my investigation, both Iceberg and Delta support storing commit-level information, but we might need to adjust our current code. Here’s a summary of the findings:
To align with these capabilities, some code adjustments may be needed for both Iceberg and Delta. I’ll start working on a proof of concept to explore this, and will get back to you once it’s completed.
Important Read
What is the purpose of the pull request
Previously, if a rollback/restore occurred in the source table, XTable would reflect it as file changes (added or deleted) in the target table. In this PR, we aim to improve this by issuing a rollback command in the target tables, ensuring more consistent histories between the source and target. This approach is also more efficient, as it allows us to restore directly to a specific version/snapshot instead of computing a large diff against the table’s current state.
This is the first part of this enhancement (1/2), focusing primarily on detecting whether a rollback/restore occurred in the source table and verifying if the corresponding commit exists in the target table.
Brief change log
Additional Info
Source Identifier
The source identifier represent the mapping between source and target format, means we could use the source identifier to find corresponding target COMMIT
Fallback scenarios
Fallback will happen when a rollback or restore is detected in the source table, but the corresponding commit is not found in the target table. We will still leverage the rollback information from the source, but this round of sync will be treated as file changes in the target table, following the previous behavior.
Here’s an example:
In this case, we can not guarantee complete metadata consistency between the source and target, but it helps reduce some computation.
Verify this pull request
This pull request is already covered by existing tests, all existing tests should pass