delta-io / delta-kernel-rs

A native Delta implementation for integration with any query engine
Apache License 2.0
144 stars 41 forks source link

Add methods for constructing `LogSegment` for Snapshot and for TableChanges #495

Closed OussamaSaoudi-db closed 1 day ago

OussamaSaoudi-db commented 1 day ago

What changes are proposed in this pull request?

This introduces two methods to construct LogSegment. The first is constructing LogSegment for Snapshots using LogSegment::for_snapshot. The second constructs LogSegment for the upcoming TableChanges type.

This PR also refactors log listing functions to reduce duplication in the code. We do so by creating a function get_parsed_log_files_iter to list, filter, and parse log files.

This adds a test function to test-utils called delta_path_for_multipart_checkpoint. This function can be used to create a multipart checkpoint path.

This replaces the changes proposed in #457

How was this change tested?

This change introduces tests for the following:

This PR also adds an ignored test that checks for desired behaviour. The test build_snapshot_with_missing_checkpoint_part_no_hint checks that an incomplete checkpoint is not used in a LogSegment. A checkpoint is incomplete if it does not have all the parts specified in LogPathFileType::MultiPartCheckpoint.num_parts.

codecov[bot] commented 1 day ago

Codecov Report

Attention: Patch coverage is 90.15625% with 63 lines in your changes missing coverage. Please review.

Project coverage is 80.20%. Comparing base (4ad2f8b) to head (358ed16). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/log_segment/tests.rs 91.62% 37 Missing :warning:
kernel/src/log_segment.rs 87.95% 11 Missing and 12 partials :warning:
kernel/src/snapshot.rs 57.14% 1 Missing and 2 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #495 +/- ## ========================================== + Coverage 79.82% 80.20% +0.38% ========================================== Files 57 58 +1 Lines 12588 12994 +406 Branches 12588 12994 +406 ========================================== + Hits 10048 10422 +374 - Misses 2004 2033 +29 - Partials 536 539 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.