delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.64k stars 1.71k forks source link

Exclude metadata only updates from DV check #3686

Closed cstavr closed 2 months ago

cstavr commented 2 months ago

Which Delta project/connector is this regarding?

Description

During commit we validate that AddFile actions cannot contain Deletion Vectors when DVs are not enabled for a table (table property). This restriction is incorrect for actions that update metadata of existing files, e.g. ComputeStatistics or RowTrackingBackfill. The current code skips the check for ComputeStatistics operation but not for other operations that perform in-place-metadata updates. The new isInPlaceFileMetadataUpdate method is added to Delta operations so that we can easily distinguish such operations.

The getAssertDeletionVectorWellFormedFunc function is slightly refactor to be more readable.

How was this patch tested?

Existing tests provide coverage.

Does this PR introduce any user-facing changes?

No