delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[Spark] Revert column mapping protocol fix #3748

Closed andreaschat-db closed 1 month ago

andreaschat-db commented 1 month ago

Which Delta project/connector is this regarding?

Description

We revert of https://github.com/delta-io/delta/commit/920f185a04382ff466e47e75240dfda48efe40d3 due to creating a backward/forward compatibility issue.

In the original PR we fixed an issue where when column mapping was the only reader feature, it would not appear in the reader features set. This is primarily a memory representation issue, but it turns out the invalid protocol could also be serialized with a specific sequence of events.

The protocol action, has a requirement at initialization time to ensure that only protocols with version 3 have the reader features set. When we fixed the column mapping bug, the requirement was expanded to also include reader features with version 2. This can be problematic if a table was created with an old Delta version which allowed to serialize the invalid protocol, and then try to read the table with the latest Delta version. The reverse is also problematic.

How was this patch tested?

Clean revert. Existing tests.

Does this PR introduce any user-facing changes?

No.