apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.62k stars 802 forks source link

StructArray take doesn't make fields nullable #6727

Open gatesn opened 1 week ago

gatesn commented 1 week ago

Describe the bug When calling arrow::compute::take on a StructArray with non-nullable fields and passing take indices that contain null values, the resulting StructArray still has non-nullable fields. This is an invalid state.

Expected behavior The take function should convert all fields to nullable iff the take indices contain any nulls.

https://docs.rs/arrow-select/53.2.0/src/arrow_select/take.rs.html#238-239

tustvold commented 1 week ago

Generally I wouldn't expect a selection kernel to alter the schema, so I think in this case it should raise an error

gatesn commented 1 week ago

Yes that's also reasonable.


It's a bit annoying that Arrow DataTypes don't themselves have a nullable flat, since the selection kernels over non-nested arrays can also introduce nulls to previously non-null arrays.

irenjj commented 1 week ago

take

tustvold commented 1 week ago

It's a bit annoying that Arrow DataTypes don't themselves have a nullable flat

One way to get this is to use StructArray in place of RecordBatch, this is actually what a lot of the IO logic in arrow-rs does, converting to RecordBatch at the edges.

IMO RecordBatch is confusing and arrow would be better off without it, but it's too late for that now 😅

malikrohail commented 4 days ago

can i fix