apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
151 stars 34 forks source link

feat(python): Add user-facing ArrayStream class #439

Closed paleolimbot closed 2 months ago

paleolimbot commented 2 months ago

This class provides an interface to the ArrowArrayStream whose methods return Schemas and Arrays. It also provides a more ergonomic interface to the ipc.Stream interface.

import nanoarrow as na

na.ArrayStream([1, 2, 3], na.int32())
#> <nanoarrow.ArrayStream: Schema(INT32)>

na.ArrayStream([1, 2, 3], na.int32()).read_all()
#> nanoarrow.Array<int32>[3]
#> 1
#> 2
#> 3

url = "https://github.com/apache/arrow-experiments/raw/main/data/arrow-commits/arrow-commits.arrows"
na.ArrayStream.from_url(url).read_all()
#> nanoarrow.Array<struct<commit: string, time: timestamp('us', 'UTC'), ...>[15487]
#> {'commit': '49cdb0fe4e98fda19031c864a18e6156c6edbf3c', 'time': datetime.datet...
#> {'commit': '1d966e98e41ce817d1f8c5159c0b9caa4de75816', 'time': datetime.datet...
#> {'commit': '96f26a89bd73997f7532643cdb27d04b70971530', 'time': datetime.datet...
#> {'commit': 'ee1a8c39a55f3543a82fed900dadca791f6e9f88', 'time': datetime.datet...
#> {'commit': '3d467ac7bfae03cf2db09807054c5672e1959aec', 'time': datetime.datet...
#> {'commit': 'ef6ea6beed071ed070daf03508f4c14b4072d6f2', 'time': datetime.datet...
#> {'commit': '53e0c745ad491af98a5bf18b67541b12d7790beb', 'time': datetime.datet...
#> {'commit': '3ba6d286caad328b8572a3b9228045da8c8d2043', 'time': datetime.datet...
#> {'commit': '4ce9a5edd2710fb8bf0c642fd0e3863b01c2ea20', 'time': datetime.datet...
#> {'commit': '2445975162905bd8d9a42ffc9cd0daa0e19d3251', 'time': datetime.datet...
#> ...and 15477 more items
jorisvandenbossche commented 2 months ago

Nice!