delta-io / delta-kernel-rs

A native Delta implementation for integration with any query engine
Apache License 2.0
144 stars 41 forks source link

Simplify log replay visitor and avoid materializing Add/Remove actions #494

Open scovich opened 2 days ago

scovich commented 2 days ago

NOTE: Stacked on https://github.com/delta-incubator/delta-kernel-rs/pull/481 (ignore first nine commits)

What changes are proposed in this pull request?

The existing log replay logic materialized full Add and Remove actions in order to examine the four fields of each that comprise the file comparison key. There was also a lot of indirection and duplication because of an old (now-unused) flavor of log replay. It also created a new expression evaluator for each batch instead of reusing it for the whole iteration.

Streamline the logic to only visit the columns of interest and generally reduce code bloat.

How was this change tested?

Existing log replay unit tests.

codecov[bot] commented 2 days ago

Codecov Report

Attention: Patch coverage is 72.28261% with 102 lines in your changes missing coverage. Please review.

Project coverage is 79.70%. Comparing base (67cc099) to head (1b76ce5).

Files with missing lines Patch % Lines
kernel/src/engine/arrow_data.rs 66.08% 23 Missing and 16 partials :warning:
kernel/src/actions/visitors.rs 47.05% 34 Missing and 2 partials :warning:
kernel/src/scan/log_replay.rs 85.32% 6 Missing and 10 partials :warning:
kernel/src/scan/state.rs 54.54% 4 Missing and 1 partial :warning:
kernel/src/actions/mod.rs 70.00% 2 Missing and 1 partial :warning:
kernel/src/actions/deletion_vector.rs 90.00% 1 Missing :warning:
kernel/src/actions/set_transaction.rs 0.00% 0 Missing and 1 partial :warning:
kernel/src/scan/data_skipping.rs 0.00% 0 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #494 +/- ## ========================================== - Coverage 79.82% 79.70% -0.12% ========================================== Files 57 57 Lines 12591 12676 +85 Branches 12591 12676 +85 ========================================== + Hits 10051 10104 +53 - Misses 2006 2045 +39 + Partials 534 527 -7 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.