apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
153 stars 34 forks source link

feat(python): support missing values (None) in building an array from an iterable #424

Closed jorisvandenbossche closed 2 months ago

jorisvandenbossche commented 3 months ago

Doing na.Array([0, 1, 2], na.int64()) works fine, but once there is a None in the iterable, it no longer works:

>>> import nanoarrow as na
>>> 
>>> arr = na.Array([0, 1, 2], na.int64())
>>> arr
nanoarrow.Array<int64>[3]
0
1
2
>>> arr = na.Array([0, 1, 2, None], na.int64())
Traceback (most recent call last):
  File "/home/joris/scipy/repos/arrow-nanoarrow/python/src/nanoarrow/c_lib.py", line 363, in c_array_stream
    array = c_array(obj, schema=schema)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/joris/scipy/repos/arrow-nanoarrow/python/src/nanoarrow/c_lib.py", line 179, in c_array
    return _c_array_from_iterable(obj, schema)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/joris/scipy/repos/arrow-nanoarrow/python/src/nanoarrow/c_lib.py", line 622, in _c_array_from_iterable
    buffer, n_values = _c_buffer_from_iterable(obj, schema)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/joris/scipy/repos/arrow-nanoarrow/python/src/nanoarrow/c_lib.py", line 644, in _c_buffer_from_iterable
    n_values_written = builder.write_elements(obj)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/nanoarrow/_lib.pyx", line 1923, in nanoarrow._lib.CBufferBuilder.write_elements
struct.error: required argument is not an integer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/joris/scipy/repos/arrow-nanoarrow/python/src/nanoarrow/array.py", line 153, in __init__
    with c_array_stream(obj, schema=schema) as stream:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/joris/scipy/repos/arrow-nanoarrow/python/src/nanoarrow/c_lib.py", line 366, in c_array_stream
    raise TypeError(
TypeError: Can't convert object of type list to nanoarrow.c_array_stream or nanoarrow.c_array

(and again, similar as https://github.com/apache/arrow-nanoarrow/issues/423, the final error message is a bit unclear)