apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.55k stars 766 forks source link

Support RecordBatch.flatten #6369

Open kszlim opened 1 month ago

kszlim commented 1 month ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. I want to write flattened parquet files, as not everything has support for structs.

Describe the solution you'd like Recursively flatten all struct columns in a recordbatch (similar to pandas json normalize), alternatively, a solution via datafusion might be acceptable.

Describe alternatives you've considered Running pyarrow.Table.flatten in a loop until there are no more top level struct columns, though this requires you to go through python.

alamb commented 1 month ago

I think implementing the equivalent of https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.flatten

For RecordBatch

Makes sense to me

kszlim commented 1 month ago

If implemented similar to json normalize you could take in a max depth option, this would make it strictly more powerful/flexible than pyarrow.Table.flatten.