capnproto / pycapnp

Cap'n Proto serialization/RPC system - Python bindings
BSD 2-Clause "Simplified" License
470 stars 123 forks source link

Compatibility with numpy arrays #148

Open mitar opened 7 years ago

mitar commented 7 years ago

Numpy arrays can potentially be large and serializing them can be a performance hit. I am looking into ways to efficiently pass numpy arrays between processes and I was wondering that maybe cap'n proto could help here. Probably it could wrap a numpy array in some way, based on the fact that numpy arrays are pretty dense data structures as well (not sure about its sparse representation). I am still trying to understand cap'n proto so any suggestion on how to do that would be welcome.

jparyani commented 7 years ago

So this is something I definitely want to support.

For capnproto structs that don't use pointers, this is pretty easy since each message will be a fixed size (assuming they're all written with the same version of a schema). Asumming they're fixed size, we could use strides in the constructor of an ndarray (see https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html). We also may be able to leverage Cython a bit here and use numpy's c-API to be even more efficient.

For non-simple capnproto messages (i.e. ones with nested structs or strings or arrays), we'd have to explore what's possible in the numpy c-API.

If you fit into the simple struct case, let me know and I can probably cook up something simple.

mitar commented 7 years ago

I am glad you like the idea. Here are some internals on how numpy stores data and this.

mitar commented 7 years ago

So in my case I was more thinking of allowing to pass arbitrary array sizes. So that I could remote machine learning algorithms, where algorithms are fixed, but arguments (including array sizes and their data) change. So a message in my case would be some serialization of *args and **kwargs. Not sure if capn' proto supports that.

charleschen6 commented 5 years ago

@mitar Do you have any progress in this problem? I have same application to use cap'n proto to transfer trainning messages between learning algorithm and environment. There are existly a message with numpy array, I am not sure whether it's OK to continue using cap'n.

mitar commented 5 years ago

I am using Arrow now for this: https://arrow.apache.org/

charleschen6 commented 5 years ago

thank you for replying.