AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Apache License 2.0
138 stars 77 forks source link

Add a feature to collapse structs or the output data #685

Closed yruslan closed 5 months ago

yruslan commented 5 months ago

Background

Currently, we have 2 options for schema transformation:

.option("schema_retention_policy", "keep_original") 
.option("schema_retention_policy", "collapse_root") 

Field names in mainframe copybooks are usually unique, even if they are part of nested structs. Cobrix can remove all nesting until an array or a primitive is encountered.

Feature

Add a feature to collapse structs or the output data.

Example [Optional]

A simple example if applicable.

Proposed Solution [Optional]

Solution Ideas

  1. Add a new option
    .option("schema_retention_policy", "collapse_struct") 

    that unstructs on-fly. OR

  2. Add a method to SparkUtils that unstructs as a post-processing.