AbsaOSS / enceladus

Dynamic Conformance Engine
Apache License 2.0
31 stars 14 forks source link

Support output column on a different level #1279

Open yruslan opened 4 years ago

yruslan commented 4 years ago

Background

Currently, conformance rules require that output columns be at the same level as input columns (e.g. casting rule). This seems like a too strict limitation.

Feature

Output columns can be at different struct levels as long as the array boundary is not crossed.

Example [Optional]

Here is an example schema:

root
 |-- id: long (nullable = true)
 |-- key1: long (nullable = true)
 |-- key2: long (nullable = true)
 |-- struct1: struct (nullable = true)
 |    |-- key3: integer (nullable = true)
 |    |-- key4: integer (nullable = true)
 |-- array1: array (nullable = true)
 |    |-- element: struct
 |    |    |-- key7: long (nullable = true)
 |    |    |-- key8: long (nullable = true)
 |    |    |-- skey2: string (nullable = true)

Proposed Solution [Optional]

spark-hats allows a unified way of accessing columns from arbitrary levels of nesting as long as array boundaries are not crossed. We can utilize that in conformance rules. This is similar to how it is done for the broadcasting mapping rule strategy.

Zejnilovic commented 4 years ago

We called it a bug previously #759