Open asfimport opened 5 years ago
Sebastien Binet / @sbinet: not saying it wouldn't be advisable nor doable, but: if it's already in a shmem region, why not just use that already?
(and I guess it's kind of implementing: https://issues.apache.org/jira/browse/ARROW-4852)
Nick Poorman / @nickpoorman: https://issues.apache.org/jira/browse/ARROW-4852 Is the same use case I'm thinking of.
If you have an Arrow Table in C (or Python) and you want to access the data in Go, you can pass a pointer back from C to the underlying data buffers. However, you still have to collect all the metadata to utilize the buffers. Making CGO calls is slow, so being able to pass a pointer to the data buffers and a pointer to the serialized metadata would ensure a more constant time when crossing the language boundary.
I did a simple POC to demonstrate what it would take to collect all the information from Python and re-materialize it in Go. https://github.com/nickpoorman/go-py-arrow-bridge The bottleneck is the number of CGO calls required to fetch all the metadata.
Sebastien Binet / @sbinet: ok.
(just nit-picking but to really assess the CGo overhead, one should directly call C, not C++-via-python :P. that said, it's a nice PoC.)
SGTM.
For cases where we have a known shared memory region, it would be great if the ipc.Writer (and by extension ipc.Reader?) had the ability to write out everything but the actual buffers holding the data. That way we can still utilize the ipc mechanisms to communicate without having to serialize all the underlying data across the wire.
This seems like it should be possible since the
RecordBatch
flatbuffers only contain the metadata and the underlying data buffers are appended later. We just need to skip appending the underlying data buffers.@sbinet thoughts?
Reporter: Nick Poorman / @nickpoorman
Note: This issue was originally created as ARROW-6107. Please see the migration documentation for further details.