Open zinking opened 10 months ago
@RussellSpitzer any comments ?
I am seeing v2 tables (partitioned tables) having delete files retained in partitions but those delete files wont apply to any data files within that partition.
This is mentioned in https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_position_delete_files as "dangling delete" problem. We don't know whether a delete file still refers to a live data file unless we compare their content with live data path, like what rewrite_position_delete_files does.
@manuzhang sounds different stuff. the issue pointed here is not POS delete specific. equality delete has same issue. the key here is partition
delete files within a partition won't have effect in other partitions.
@zinking I see. An extreme case is if there's one partition left not compacted, none of the other partitions can drop their delete files after compaction.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
Apache Iceberg version
1.4.2 (latest release)
Query engine
Spark
Please describe the bug 🐞
the minDataSequenceNumber is calculated table wise, but in theory it should be partition wise ? obviously delete file within 1 partition only applies to that partition.
I am seeing v2 tables (partitioned tables) having delete files retained in partitions but those delete files wont apply to any data files within that partition.