An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
This PR fixes an issue with schema evolution in Delta Lake where adding a new field to a struct within a map and renaming an existing top level field caused the operation to fail.
The fix includes logic to handle these transformations properly, ensuring that new fields are added without conflicts.
It also resolved a ToDo of casting map types in the DeltaAnalysis.scala module.
Changes:
Updated schema evolution logic to support nested structs within map transformations.
Added additional case statements to handle MapTypes in addCastToColumn method in DeltaAnalysis.scala module.
Modified TypeWideningInsertSchemaEvolutionSuite test to support schema evolution of maps.
Added an additional method (addCastsToMaps) to DeltaAnalysis.scala module.
Added EvolutionWithMap in the example modules to demonstrate use case.
Related Issues:
Resolves: #3227
Which Delta project/connector is this regarding?
[✓] Spark
[ ] Standalone
[ ] Flink
[ ] Kernel
[ ] Other (fill in here)
Description
How was this patch tested?
Tested through:
Integration Tests: Validated changes with Delta Lake and Spark integration. See EvolutionWithMap.
No, it doesn't introduce any user-facing changes. It only resolved an issue even in the released versions of Delta Lake.
The previous behaviour was an error message when attempting operations involving adding extra fields to StructField in maps:
[DATATYPE_MISMATCH.CAST_WITHOUT_SUGGESTION] Cannot resolve "metrics" due to data type mismatch: cannot cast "MAP<STRING, STRUCT<id: INT, value: INT, comment: STRING>>" to "MAP<STRING, STRUCT<id: INT, value: INT>>".
This PR fixes an issue with schema evolution in Delta Lake where adding a new field to a struct within a map and renaming an existing top level field caused the operation to fail.
The fix includes logic to handle these transformations properly, ensuring that new fields are added without conflicts.
It also resolved a ToDo of casting map types in the DeltaAnalysis.scala module.
Changes:
Related Issues:
Which Delta project/connector is this regarding?
Description
How was this patch tested?
Tested through:
Does this PR introduce any user-facing changes?
No, it doesn't introduce any user-facing changes. It only resolved an issue even in the released versions of Delta Lake.
The previous behaviour was an error message when attempting operations involving adding extra fields to StructField in maps: [DATATYPE_MISMATCH.CAST_WITHOUT_SUGGESTION] Cannot resolve "metrics" due to data type mismatch: cannot cast "MAP<STRING, STRUCT<id: INT, value: INT, comment: STRING>>" to "MAP<STRING, STRUCT<id: INT, value: INT>>".