delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[Spark] Fix schema evolution issue with nested struct (within a map) and column renamed #3886

Open Richard-code-gig opened 5 days ago

Richard-code-gig commented 5 days ago

This PR fixes an issue with schema evolution in Delta Lake where adding a new field to a struct within a map and renaming an existing top level field caused the operation to fail.

The fix includes logic to handle these transformations properly, ensuring that new fields are added without conflicts.

It also resolved a ToDo of casting map types in the DeltaAnalysis.scala module.

Changes:

Related Issues:

Which Delta project/connector is this regarding?

Description

How was this patch tested?

Tested through:

Does this PR introduce any user-facing changes?

No, it doesn't introduce any user-facing changes. It only resolved an issue even in the released versions of Delta Lake.

The previous behaviour was an error message when attempting operations involving adding extra fields to StructField in maps: [DATATYPE_MISMATCH.CAST_WITHOUT_SUGGESTION] Cannot resolve "metrics" due to data type mismatch: cannot cast "MAP<STRING, STRUCT<id: INT, value: INT, comment: STRING>>" to "MAP<STRING, STRUCT<id: INT, value: INT>>".

Richard-code-gig commented 3 days ago

Thanks for the insight @johanl-db. I will try get the stack trace for maxIteration error too