Open dannygoldstein opened 1 year ago
@dannygoldstein thanks for the report! That's indeed a bug. The problem is that for a union array, there is no top-level validity, but this is defined by the validity bitmaps of its child arrays. But the is_null
kernel should take this into account, which doesn't seem to happen.
The problem is also that the null_count
attribute is already wrong (and it might be that is_null
is taking a shortcut because of that):
>>> dense.null_count
0
thanks for the quick response @jorisvandenbossche! and thanks also for all the great work on arrow. it is an awesome package :)
Looking at the kernel it seems both problems are there. It does indeed shortcut based on null_count
and, even if it didn't, there is no special logic for unions (it just grabs the validity bitmap).
Describe the bug, including details regarding any error messages, version, and platform.
In pyarrow version 11.0.0 and 10.0.1, if I create a dense array with some null elements,
pa.compute.is_null()
returns that they are not null. Repro:Illustration of the first issue:
Illustration of the second issue: I do
pa.compute.is_null()
on a null element of the array, i get a segfault:Component(s)
Python