delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

Strip column mapping metadata when feature is disabled #3688

Closed cstavr closed 2 months ago

cstavr commented 2 months ago

Which Delta project/connector is this regarding?

Description

Transactions might try to create or update the schema of a Delta table with columns that contain column mapping metadata, even when column mapping is not enabled. For example, this can happen when transactions copy the schema from another table without stripping metadata.

To avoid such issues, we automatically strip column mapping metadata when column mapping is disabled. We are doing this only for new tables or for transactions that add column mapping metadata for the first time. If column metadata already exist, we cannot strip them because this would break the table. A usage log is emitted so we can understand the impact on existing tables.

Note that this change covers the cases where txn.updateMetadata is called (the "proper API") and not the cases where a Metadata action is directly committed to the table.

Finally, this commit changes drop column mapping command to check that all column mapping metadata do not exist, and not only physical column name and ID.

How was this patch tested?

Added new UT.

Does this PR introduce any user-facing changes?

No.