An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
This PR adds checksum validation logic. On every checkpoint, we will take the computed state of the table as per the deltas and the previous checkpoint and compare it against the checksum that was written at that version. The same methods can potentially be used to validate more frequently (if needed).
How was this patch tested?
Added a new test case in ChecksumSuite that tests that all logically corrupted fields are being caught by the validation logic.
Which Delta project/connector is this regarding?
Description
Follow up for https://github.com/delta-io/delta/pull/3828.
This PR adds checksum validation logic. On every checkpoint, we will take the computed state of the table as per the deltas and the previous checkpoint and compare it against the checksum that was written at that version. The same methods can potentially be used to validate more frequently (if needed).
How was this patch tested?
Added a new test case in ChecksumSuite that tests that all logically corrupted fields are being caught by the validation logic.
Does this PR introduce any user-facing changes?
No