This PR updates the logic that creates a "shallow copy" of an ArrowArray. Before, it simply made a shallow copy of the outer array, which works in most cases. However, the spec allows for "moving" child arrays, which means that the guarantee of a valid outer array does not guarantee anything about the validity of the ArrowArray* child pointers. It's difficult, but possible, to trigger this in Python (below); however, I think this sort of code is common for importers (because it allows the lifecycle of columns in an array to be independent).
import nanoarrow as na
import pyarrow as pa
# Given some array
array = na.c_array_from_buffers(
na.struct({"col1": na.int32()}),
3,
[None],
children=[na.c_array([1, 2, 3], na.int32())]
)
user_array = na.Array(array)
user_array
#> nanoarrow.Array<int32>[3]
#> {'col1': 1}
#> {'col1': 2}
#> {'col1': 3}
# totally valid shallow copy of this array
schema_capsule, array_capsule = array.__arrow_c_array__()
array2 = na.c_array(array_capsule, schema_capsule)
# A consumer is technically allowed to move a child array
x = pa.Array._import_from_c(array2.child(0)._addr(), pa.int32())
del array
del x
# With the previous shallow copy implementation, this could segfault or fail
list(user_array.iter_py())
# Instead of a segfault I tend to get:
#> NanoarrowException: ArrowBasicArrayStreamValidate() failed (22): Expected int32 array buffer 1 to have size >= 12 bytes but found buffer with 0 bytes
This PR updates the logic that creates a "shallow copy" of an
ArrowArray
. Before, it simply made a shallow copy of the outer array, which works in most cases. However, the spec allows for "moving" child arrays, which means that the guarantee of a valid outer array does not guarantee anything about the validity of theArrowArray*
child pointers. It's difficult, but possible, to trigger this in Python (below); however, I think this sort of code is common for importers (because it allows the lifecycle of columns in an array to be independent).After this PR the original array remains valid:
This does, however, have a non-trivial effect when making a shallow copy where a lot of children are involved: