Currently, conformance rules require that output columns be at the same level as input columns (e.g. casting rule). This seems like a too strict limitation.
Feature
Output columns can be at different struct levels as long as the array boundary is not crossed.
Currently you can't use the casting tole to convert struct.key3 from to string and save it to key3_String. The output column should be at the same level, e.g. struct1.key3_String. This should be allowed.
Converting array1.key7 to string and saving it to key7_String should be disallowed since it crosses the array boundary. The conversion can only be allowed if the output column is also inside the array, e.g. array1.key7_String.
Proposed Solution [Optional]
spark-hats allows a unified way of accessing columns from arbitrary levels of nesting as long as array boundaries are not crossed. We can utilize that in conformance rules. This is similar to how it is done for the broadcasting mapping rule strategy.
Background
Currently, conformance rules require that output columns be at the same level as input columns (e.g. casting rule). This seems like a too strict limitation.
Feature
Output columns can be at different struct levels as long as the array boundary is not crossed.
Example [Optional]
Here is an example schema:
struct.key3
from tostring
and save it tokey3_String
. The output column should be at the same level, e.g.struct1.key3_String
. This should be allowed.array1.key7
tostring
and saving it tokey7_String
should be disallowed since it crosses the array boundary. The conversion can only be allowed if the output column is also inside the array, e.g.array1.key7_String
.Proposed Solution [Optional]
spark-hats
allows a unified way of accessing columns from arbitrary levels of nesting as long as array boundaries are not crossed. We can utilize that in conformance rules. This is similar to how it is done for the broadcasting mapping rule strategy.