apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
169 stars 35 forks source link

feat(python): Add `Array.from_chunks()` constructor #456

Closed paleolimbot closed 4 months ago

paleolimbot commented 4 months ago

This PR adds a public route to construct chunked arrays (and makes the other constructors safer to account for the fact that they are now user-facing). I use this quite a lot interactively to test that things work in the chunked case, and for nanoarrow to be useful in a "I can help you export things" kind of way, it needs to be able to do this (because string arrays with more than 2 GB of text or binary are not uncommon).

The main safety consideration here is ensuring that all chunks have a schema of the same type, so I had to add a function to check for that (and ensure it was being checked).

import nanoarrow as na
import numpy as np

na.Array.from_chunks([[1, 2, 3], [4, 5, 6]], na.int32())
na.Array.from_chunks((np.random.random(int(1e3)) for _ in range(int(1e3))))