Open wooyeong opened 9 months ago
@wooyeong: Nice catch. Yeah, Optimizer thinks they both are identical query and merged it (as it is not aware of snapshot id/reference info).
I confirmed that the query works after changing to
SparkTable#equals
to comparebranch
andsnapshotId
as well as name, to have unique canonical form for each ref.
I think this will miss an edge case. For example, if two time travel queries on same table, one with branch name and one with snapshot, both mapping to the same snapshot id should considered same. But in above case, it will be considered as different.
I think the fix should be to find an effective snapshot id,
icebergTable.snapshot(branch)
if branch
not null or snapshotId
if not null and compare effective snapshot id in the equals()
.
Would you like to contribute the fix?
@ajantha-bhat Thank you for your comment. Let me try soon.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
Apache Iceberg version
1.4.2 (latest release)
Query engine
Spark
Please describe the bug 🐞
I was trying to make a history table using the time travel feature. For example, I could only insert only one row for the date 2024-01-02 thanks to time travel.
However, I encountered a weird bug that blocked me from making a complete set of historical data. I'll paste reproducible queries and the possible cause below.
Queries
Query plan
Cause
Possible fix
You could save me huge storage costs if I can utilize this feature. I look forward to your advice. Thanks in advance.