apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
169 stars 35 forks source link

ArrowArrayViewSetArray fails for BinaryView array with no variadic buffers #634

Closed WillAyd closed 6 days ago

WillAyd commented 1 week ago

Testing out the new implementation with data like:

import pyarrow as pa
arr = pa.array(["foo", "baz"], type=pa.string_view())

an extension that calls ArrowArrayViewSetArray will throw an error of Expected array with 2 buffer(s) but found 3

When debugging, it appears that the Arrow C bridge may be exporting the variadic length buffer, even though it does not exist. This can be confirmed from gdb:

Thread 1 "python" hit Breakpoint 2, ArrowArrayViewSetArrayInternal (array_view=0x7fffffffaf60, array=0x7ffff0e862d0, error=0x7fffffffb020) at /home/willayd/clones/nanoarrow_mre/build/_deps/nanoarrow-project-src/src/nanoarrow/common/array.c:764
764     const int64_t n_buffers = array->n_buffers;
(gdb) p ((union ArrowBinaryView*)array->buffers[1])[0]
$2 = {inlined = {size = 3, data = "foo\000\000\000\000\000\000\000\000"}, ref = {size = 3, prefix = "foo", buffer_index = 0, offset = 0}, 
  alignment_dummy = 31366206292230147}
(gdb) p ((union ArrowBinaryView*)array->buffers[1])[1]
$3 = {inlined = {size = 3, data = "bar\000\000\000\000\000\000\000\000"}, ref = {size = 3, prefix = "bar", buffer_index = 0, offset = 0}, 
  alignment_dummy = 32195220879704067}
(gdb) p *array
$4 = {length = 2, null_count = 0, offset = 0, n_buffers = 3, n_children = 0, buffers = 0x7ffff0e58b00, children = 0x8, dictionary = 0x0, 
  release = 0x7ffff245afa0 <polars_arrow::ffi::array::c_release_array>, private_data = 0x7ffff0e86280}
(gdb) p array->buffers[2]
$5 = (const void *) 0x8

is that extra buffer supposed to be part of the C data interface, or is it a bug in the Arrow bridge?

paleolimbot commented 1 week ago

Ah, I think it's supposed to be there: https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowArray.n_buffers

WillAyd commented 1 week ago

OK great - will take a look