Open kylebarron opened 1 month ago
Related issue: https://github.com/apache/arrow/issues/38137 (it also shows a workaround how you can currently zero-copy convert such object to arrow)
While this indeed seems unexpected, the underlying issue is that we simply don't have specific support for objects implementing the buffer protocol, but only for very specifically numpy arrays or pandas array-likes (and as a result, we see the memoryview as a generic python sequence, essentially converting it to a list of python floats before converting to arrow).
We should expand pa.array()
to support objects implementing the buffer protocol (and eg in that case convert to a numpy array and use the code path for numpy).
I am not sure you can easily check from python if an object supports the buffer protocol, but given this function lives in cython, I assume we can use something like PyObject_CheckBuffer
Describe the bug, including details regarding any error messages, version, and platform.
pyarrow seems to be applying some upcasting when importing data via the buffer protocol. This was unexpected behavior to me and could be considered a bug.
pyarrow seems to cast:
float32
->float64
int32
->int64
uint64
->int64
As expected:
Unexpected casts:
Component(s)
Python