Closed paleolimbot closed 1 month ago
I hear everybody on the naming thing! "Column" is not great because it doesn't have a precedent here (the closest thing would be pyarrow.Table.column()
, which is still giving arrow arrays), and "Builder" is already used to describe the conversion of
There is precedent for the term "convert" in the R bindings ( https://arrow.apache.org/nanoarrow/latest/r/reference/convert_array_stream.html ) and Arrow C++ ( https://github.com/apache/arrow/blob/2dbc5e26dcbc6826b4eb7a330fa8090836f6b727/cpp/src/arrow/util/converter.h#L40 ), and so I gave that terminology a try in the last few commits.
The crux of what these helpers are trying to do is to get a stream of arrays (possibly of indeterminate length) out of Arrow land to be represented by something else. The default something else has to be limited to the Python standard library because of the zero dependencyness, which is "pybuffer or list". In the R bindings you can do things like:
convert_array_stream(stream) # default conversion
convert_array_stream(stream, tibble::tibble()) # explicit output prototype
Here, visitable.convert()
could do the same thing (although won't in this PR because it's a can of worms, and maybe not ever if nobody ends up using the high-level interface).
array.convert() # default conversion
array.convert(np.int32) # ...would get you an np.array with dtype int32
I will also mark these as "experimental" such that it's clear we're settling on the terminology/behaviour/scope here.
This PR implements building columns buffer-wise for the types where this makes sense. It also implements a few other changes:
item_size
was renamed toitemsize
to match the memoryview property nameArrayViewVisitable
mixin such that they are available in both theArray
andArrayView
without duplicating documentation.Functionally this means that the
Array
andArrayStream
now haveto_column()
andto_column_list()
methods that do something that more closely matches what somebody would expect.A quick demo:
This will basically get you data frame conversion: