Table Scan Delete File Handling: Positional and Equality Delete Support

sdd commented 1 month ago

This PR adds support for handling of both positional and equality delete files within table scans.

The approach taken is to include a list of delete file paths in every FileScanTask. At the moment it is assumed that this list refers to delete files that may apply to the data file in the scan, rather than having been filtered to only contain delete files that definitely do apply to this data file. Further optimisation of plan_files is expected in the future to ensure that this list is pre-filtered before being included in the FileScanTask.

The arrow reader now has the responsibility of retrieving the applicable delete files from FileIO as well as the data file. Thanks to the Object Cache already being in place, if there are file scan tasks being handled in parallel in the same process that both attempt to load the same delete file, the Object Cache should ensure that only a single load and parse occurs. I've just realised that storing data files in the Object Cache is in my local fork - we only store manifests and manifest lists in the Object Cache in this repo! I can create an issue for adding this in a follow-up PR as it brings big performance improvements in many situations.

The approach taken for handling each type of delete file is as follows:

Positional Deletes

Positional Delete support is implemented using the RowSelection functionality of the parquet arrow reader that we are already using. The list of applicable positional delete indices is turned into a RowSelection. If the scan already has enable_row_selection enabled and there is a scan filter predicate, then the RowSelection from this is intersected with the positional delete RowSelection to yield a single combined RowSelection.

NB: This implementation relies on all data files for the table have been written with parquet page indexes in order for positional delete file application to succeed. This seems to be the case with parquet files written by Spark or Iceberg Java but not pyiceberg. In scenarios where positional delete files are encountered, but one or more of the data files that they apply to does not contain a page index, then the scan will return an error. This is at least an improvement on the status quo, where positional delete files cause a scan to fail in all circumstances, and for consumers who are not writing parquet files without page indexes this will not be an issue.

Equality Deletes

All rows from all applicable equality delete files are combined together to create a single BoundPredicate. If the scan also has a filter predicate, this is ANDed with the delete predicate to form a final combined BoundPredicate that is used as before to construct the arrow RowFilter and is also used in the row group filtering.

Update

I added Equality Delete handling into this PR as it only made a difference of about 350 lines.

sdd commented 1 month ago

@Xuanwo, @liurenjie1024: This is now ready to review, PTAL when you guys get chance. Look forward to your feedback 😁

sdd commented 3 weeks ago

Hi @liurenjie1024 and @Xuanwo - would either of you be able to review this at some point please? I know it's a bit large, sorry. Thanks :-)

liurenjie1024 commented 3 weeks ago

Hi @liurenjie1024 and @Xuanwo - would either of you be able to review this at some point please? I know it's a bit large, sorry. Thanks :-)

Hi, @sdd Thanks for your patience. In fact I already started reviewing it, and it's a little large, so it may take some time.

sdd commented 3 weeks ago

Hey @liurenjie1024 - sorry to make changes whilst you are reviewing. I updated the design of the DeleteFileManager as I was not happy with it.

sdd commented 1 week ago

Thanks so much for the review on this @liurenjie1024 - I've been ill for the past week or two so I've not had chance to work through your review in detail yet. I just wanted to let you know I've seen it and will pick it up when I've recovered. 👍

liurenjie1024 commented 1 week ago

Thanks so much for the review on this @liurenjie1024 - I've been ill for the past week or two so I've not had chance to work through your review in detail yet. I just wanted to let you know I've seen it and will pick it up when I've recovered. 👍

Hi, @sdd Sorry to hear that, take care of yourself! Don't worry about this, I'll be happy to discuss about this with you anytime when you're back.

apache / iceberg-rust