delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[Spark][Version Checksum] Incrementally compute VersionChecksum setTransactions and domainMetadata #3895

Open dhruvarya-db opened 10 hours ago

dhruvarya-db commented 10 hours ago

Which Delta project/connector is this regarding?

Description

Follow up for https://github.com/delta-io/delta/pull/3828. Adds support for incrementally computing the set transactions and domain metadata actions based on the current commit and the last version checksum. Incremental computation for both these action types have thresholds so that we don't store them if they are too long (tests have been added for the same).

How was this patch tested?

Added new tests in DomainMetadataSuite and a new suite called DeltaIncrementalSetTransactionsSuite

Does this PR introduce any user-facing changes?

No