Open joseph-isaacs opened 1 week ago
This is the current behavior on main too. slice
computes a zero copy slice of the array by updating length and/or offset where necessary:
>> import pyarrow as pa
>> import nanoarrow as na
>>> original_array = pa.array([{'a': 0}, {'a': 1}, {'a': 2}])
>>> sliced_array = original_array.slice(0,1)
>>> sliced_array
<pyarrow.lib.StructArray object at 0x764b3cf5d2a0>
-- is_valid: all not null
-- child 0 type: int64
[
0
]
>>> na.array(sliced_array).inspect()
<ArrowArray struct<a: int64>>
- length: 1
- offset: 0
- null_count: 0
- buffers[1]:
- validity <bool[0 b] >
- dictionary: NULL
- children[1]:
'a': <ArrowArray int64>
- length: 3
- offset: 0
- null_count: 0
- buffers[2]:
- validity <bool[0 b] >
- data <int64[24 b] 0 1 2>
- dictionary: NULL
- children[0]:
>>> na.array(original_array).inspect()
<ArrowArray struct<a: int64>>
- length: 3
- offset: 0
- null_count: 0
- buffers[1]:
- validity <bool[0 b] >
- dictionary: NULL
- children[1]:
'a': <ArrowArray int64>
- length: 3
- offset: 0
- null_count: 0
- buffers[2]:
- validity <bool[0 b] >
- data <int64[24 b] 0 1 2>
- dictionary: NULL
- children[0]:
I understand the use case but I am unsure what should be the behavior in order to generate the RecordBatch
if we have updated the offset with the slice as an example:
>>> pa.table(pa.array([{'a': 0}, {'a': 1}, {'a': 2}]).slice(1,2))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/table.pxi", line 6172, in pyarrow.lib.table
batch = record_batch(data, schema)
File "pyarrow/table.pxi", line 5991, in pyarrow.lib.record_batch
batch = RecordBatch._import_from_c_device_capsule(schema_capsule, array_capsule)
File "pyarrow/table.pxi", line 4002, in pyarrow.lib.RecordBatch._import_from_c_device_capsule
batch = GetResultValue(ImportDeviceRecordBatch(c_array, c_schema))
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
return check_status(status)
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
raise convert_status(status)
pyarrow.lib.ArrowInvalid: ArrowArray struct has non-zero offset, cannot be imported as RecordBatch
>>> pa.array([{'a': 0}, {'a': 1}, {'a': 2}]).slice(1,2)
<pyarrow.lib.StructArray object at 0x764b3cf5d8a0>
-- is_valid: all not null
-- child 0 type: int64
[
1,
2
]
>>>
@jorisvandenbossche @pitrou
Describe the bug, including details regarding any error messages, version, and platform.
Currently on pyarrow 17.0.0 creating a table from a sliced struct array ignores slice bounds
I expect
Component(s)
Python