apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.29k stars 3.47k forks source link

[Python] Extend PyCapsule interface with support for KeyValueMetadata object #43287

Open noahfrn opened 2 months ago

noahfrn commented 2 months ago

Describe the enhancement requested

Hi there,

I'm writing a C++ application that uses an Arrow IPC decoder to decode a stream of Arrow IPC, and am using pybind11 to add Python support. I'd like to be able to transparently (and with zero-copy) pass KeyValueMetadata objects to and from C++/Python, but I cannot access the current Cython wrap and unwrap utils from the pybind layer.

My current work-around is to cast these objects into a python dict / C++ unordered_map, but I'd like to be able to cast between arrow::KeyValueMetadata and pyarrow.lib.KeyValueMetadata objects, as I'm currently doing with Schema and RecordBatch objects.

The Arrow C data interface doesn't currently include a structure definition for KeyValueMetadata objects, which I imagine might need to be supported to get this to work properly.

Happy to discuss this issue and why this may/may not be possible ! Thanks again for all your work on Arrow

Component(s)

C++, Python

kylebarron commented 1 month ago

You can pass an empty schema with custom key-value schema metadata?

noahfrn commented 1 month ago

I'm trying to get this to work with the arrow::ipc::Listener interface, so that wouldn't really work here. I've switched to just casting the KeyValueMetadata object into a std::unordered_map