apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.63k stars 807 forks source link

Support RecordBatch.flatten #6369

Open kszlim opened 2 months ago

kszlim commented 2 months ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. I want to write flattened parquet files, as not everything has support for structs.

Describe the solution you'd like Recursively flatten all struct columns in a recordbatch (similar to pandas json normalize), alternatively, a solution via datafusion might be acceptable.

Describe alternatives you've considered Running pyarrow.Table.flatten in a loop until there are no more top level struct columns, though this requires you to go through python.

alamb commented 2 months ago

I think implementing the equivalent of https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.flatten

For RecordBatch

Makes sense to me

kszlim commented 2 months ago

If implemented similar to json normalize you could take in a max depth option, this would make it strictly more powerful/flexible than pyarrow.Table.flatten.

ngli-me commented 1 week ago

Hi, do you all mind if I give this a shot?

kszlim commented 1 week ago

Hi, do you all mind if I give this a shot?

Go ahead!

ngli-me commented 1 week ago

take