Netflix / iceberg

Iceberg is a table format for large, slow-moving tabular data
Apache License 2.0
476 stars 59 forks source link

Update ReplaceFiles to use MergingSnapshotUpdate. #84

Closed rdblue closed 5 years ago

rdblue commented 5 years ago

This changes the implementation of ReplaceFiles. Previously, ReplaceFiles used BaseReplaceFiles, which was only used by ReplaceFiles. Now it uses MergingSnapshotUpdate, the same base class that is used for deletes, merge appends, and overwrites.

The new implementation adds automatic merging when replacing files and takes advantage of caching that makes retries much faster.

To use MergingSnapshotUpdate for ReplaceFiles, this adds a mode that will fail when any specific paths to delete are not found in the table's current manifests. Filtered manifests are cached and reused in this mode by tracking the files that were deleted in a filtered manifest.

rdblue commented 5 years ago

@Parth-Brahmbhatt, here's the update to ReplaceFiles if you want to take a look.