awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
635 stars 299 forks source link

Fallback DynamicDataFrame.resolveChoice behaviour #157

Open FIAV1 opened 1 year ago

FIAV1 commented 1 year ago

Hi, I'd like to make a proposal for a useful option for DynamicDataFrame.resolveChoice(): in case available choices are unknown until runtime, it would be great if resolveChoice() had an option that automatically find the "biggest" type among available ones and use it to cast all the others, and this should happen for each field that has a ChoiceType(), e.g. from this:

root
|-- field1: choice
|    |-- int: int
|    |-- long: long
|-- field2: choice
|    |-- float: float
|    |-- decimal: decimal

To this:

root
|-- field1: long
|-- field2: decimal

Of course this should happen if data types are compatible, so in cases like this:

root
|-- field1: choice
|    |-- array: array
|    |-- map: map

The job should error out.