apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.51k stars 3.53k forks source link

[Python][C++] Casting a list of struct array with null field results in invalid result #43838

Open Angel-212 opened 2 months ago

Angel-212 commented 2 months ago

Describe the bug, including details regarding any error messages, version, and platform.

OS: MacOS Sonoma 14.6.1 Python: 3.10.14 pyarrow: 17.0.0

Snippet to reproduce:

import pyarrow as pa

fields = [('b', pa.null()),('g', pa.string())]
dtype = pa.list_(pa.struct(fields))
arr  = pa.array([[{'b': None, 'g': None}, {'b': None, 'g': 'moo'}]], type=dtype)
carr = pa.chunked_array(arr)
ext_arr = carr.cast(dtype, safe=True)
print(ext_arr) # prints Invalid array but no exception
print(len(ext_arr)) # Len is 1
print(ext_arr[0]) # Raises ArrowIndexError

Component(s)

Python

jorisvandenbossche commented 2 months ago

@Angel-212 thanks for the report!

The issue already happens with the plain (non-chunked) array as well. And the repr also shows it being invalid (which is then the reason that a getitem operation errors):

In [24]: arr.cast(dtype)
Out[24]: 
<pyarrow.lib.ListArray object at 0x7f0536bbd480>
<Invalid array: List child array invalid: Invalid: Struct child array #0 has length smaller than expected for struct array (1 < 2)>