Open alamb opened 3 weeks ago
I think this makes sense, the only thing to be careful with is arrays where null values may have undefined contents, e.g. dictionaries. In such cases, allowing users to go from a null to not null, could have safety implications
FWIW the nullif kernel is very similar to this, but with the caveat that nulls can only remain null, avoiding the above issue
Edit: in fact the operation you describe is the nullif kernel I think...
I think the nullif kernel provides this, so perhaps this can be closed?
Sounds good -- the current documentation on nullif is pretty sparse (and thus perhaps we can make it easier to discover / more likely people can find it) with some better docs
https://docs.rs/arrow/latest/arrow/compute/kernels/nullif/fn.nullif.html
I'll try and find some time
PR to improve the docs: https://github.com/apache/arrow-rs/pull/6658
After doing that it is somewhat of the inverse of what I was looking for (it sets the element to null
when the mask is true, rather than setting the element to null
when the mask is not true).
I will attempt a PR with a proposed API as well for consideration
I made https://github.com/apache/arrow-rs/pull/6659 to add Array::with_nulls
but it has the issues @tustvold describes
The actual usecase I have is not to turn elements unnull, but instead make them null if a boolean array is not true (the inverse of what nullif
does).
Maybe a better API would be something like nullifnot
or similar.
I will ponder
You could either negate the boolean array, or construct a BooleanArray with a null buffer of the buffer you want to be null if false, and a values buffer of entirely false. Neither is exactly ideal, but the performance, especially of the latter, is likely going to be hard to beat.
🤔
this is now we do it today in DataFusion: https://github.com/apache/datafusion/blob/f23360f1e0f90abcf92a067de55a9053da545ed2/datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/nulls.rs#L53-L102
(basically call NullBuffer::Union
) maybe that is good enough 🤔
Is your feature request related to a problem or challenge? Please describe what you are trying to do. While implementing https://github.com/apache/datafusion/pull/12792 and various other things in DataFusion I find myself often wanting to combine a filter and a null mask
The relevant code is like
Describe the solution you'd like I would like an API like
with_nulls
that returns a new array with the same data but a new null mask so my code would look likeDescribe alternatives you've considered I can keep using the unsafe APIs
Additional context