apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.6k stars 793 forks source link

`ListArray::try_new` rejects `{Union,Dictionary}Array` incorrectly if `field` is not nullable #6538

Open kawadakk opened 1 month ago

kawadakk commented 1 month ago

Describe the bug ListArray::try_new() uses Array::is_nullable() to check the presence of null values if field is not marked as nullable. https://github.com/apache/arrow-rs/blob/5508978a3c5c4eb65ef6410e097887a8adaba38a/arrow-array/src/array/list_array.rs#L191-L197

While this works for most types, this can cause a false positive for DictionaryArray and UnionArray because Array::is_nullable() is allowed to return true if it's expensive to prove the absence of logical nulls.

https://github.com/apache/arrow-rs/blob/5508978a3c5c4eb65ef6410e097887a8adaba38a/arrow-array/src/array/mod.rs#L287-L291

To Reproduce

let offsets = OffsetBuffer::new(vec![0, 1, 4, 5].into());
let mut builder = UnionBuilder::new_dense();
builder.append::<Int32Type>("a", 1).unwrap();
builder.append::<Int32Type>("b", 2).unwrap();
builder.append::<Int32Type>("b", 3).unwrap();
builder.append::<Int32Type>("a", 4).unwrap();
builder.append::<Int32Type>("a", 5).unwrap();
let values = builder.build().unwrap();
let field = Arc::new(Field::new("element", values.data_type().clone(), false));
ListArray::new(field.clone(), offsets, Arc::new(values), None);

This produces:

thread 'foo' panicked at foo.rs:10:46:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Non-nullable field of ListArray \"element\" cannot contain nulls")

Expected behavior Successful execution

Additional context

tustvold commented 1 month ago

This looks to have been introduced for UnionArray in https://github.com/apache/arrow-rs/pull/6303. This isn't so much of a problem for DictionaryArray given how rare nulls in dictionary values are