Open JerAguilon opened 1 week ago
I have a use case where I want to send a fairly small arrays/scalars via Arrow IPC. Memory efficiency is quite important for us.
One simple way of doing this would be to put the ints in a Record batch and serialize.
However, I notice that serializing an int, 3 chars, and a bool takes a whopping 512 bytes on my machine:
https://gist.github.com/JerAguilon/8c70107a576c0b04b5846ad8207651e7
STDOUT: batch: pyarrow.RecordBatch a: int64 b: string c: bool, num_rows: 1 buf.size: 512
Things like the schema are well known ahead of time, I'd like to minimize as much data in the payload as possible.
In Arrow IPC (the C++ library works for me), is there some way to serialize just a scalar or array? I see only a RecordBatchWriter exposed:
RecordBatchWriter
https://arrow.apache.org/docs/cpp/api/ipc.html#_CPPv416MakeStreamWriterPN2io12OutputStreamERKNSt10shared_ptrI6SchemaEERK15IpcWriteOptions
C++
Describe the usage question you have. Please include as many useful details as possible.
I have a use case where I want to send a fairly small arrays/scalars via Arrow IPC. Memory efficiency is quite important for us.
One simple way of doing this would be to put the ints in a Record batch and serialize.
However, I notice that serializing an int, 3 chars, and a bool takes a whopping 512 bytes on my machine:
https://gist.github.com/JerAguilon/8c70107a576c0b04b5846ad8207651e7
Things like the schema are well known ahead of time, I'd like to minimize as much data in the payload as possible.
In Arrow IPC (the C++ library works for me), is there some way to serialize just a scalar or array? I see only a
RecordBatchWriter
exposed:https://arrow.apache.org/docs/cpp/api/ipc.html#_CPPv416MakeStreamWriterPN2io12OutputStreamERKNSt10shared_ptrI6SchemaEERK15IpcWriteOptions
Component(s)
C++