apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.65k stars 3.55k forks source link

null structarrays are poorly handled by cast #37072

Open spenczar opened 1 year ago

spenczar commented 1 year ago

Describe the bug, including details regarding any error messages, version, and platform.

This code should work:

import pyarrow as pa
struct_type = pa.struct([pa.field("x", pa.int32(), nullable=False)])
nulls = pa.nulls(5, struct_type)

# The following is an error:
nulls = nulls.cast(struct_type)

The error message is:

ArrowInvalid: Can't view array of type struct<x: int32 not null> as struct<x: int32 not null>: nulls in input cannot be viewed as non-nullable

Indeed, if we print(nulls), it contains null values in the non-nullable field x:

-- is_valid:
  [
    false,
    false,
    false,
    false,
    false
  ]
-- child 0 type: int32
  [
    null,
    null,
    null,
    null,
    null
  ]

But those are all invalid at the top-level anyway, so there's no reason cast ought to care. Either that, or it should be impossible to call pa.nulls on a struct with a non-nullable field anywhere in its hierarchy of fields, but that seems wrong too. That would imply that if any field is non-nullable then the whole struct would be non-nullable, which clearly is not the intent. You should be able to have a null struct with non-nullable fields.

Ultimately, this is a C++ issue; Python is merely calling those functions.

Version

12.0.1

Component(s)

C++

spenczar commented 1 year ago

With a bit more inspection, I'm now wondering if the blame really lies with pa.nulls (so, the C++ function MakeArrayOfNull).

If I do this:

nulls = pa.array([None, None, None, None, None], struct_type)

Then everything works fine!

nulls = nulls.cast(struct_type)
print(nulls)
-- is_valid:
  [
    false,
    false,
    false,
    false,
    false
  ]
-- child 0 type: int32
  [
    0,
    0,
    0,
    0,
    0
  ]
felipecrv commented 1 year ago

Subtle issue! Nice analysis. I investigated it a bit, starting from MakeArrayOfNull, but the issue seems to be bugs in MakeDataView. If I have time, I will come up with a fix soon.