Open mitar opened 7 years ago
So this is something I definitely want to support.
For capnproto structs that don't use pointers, this is pretty easy since each message will be a fixed size (assuming they're all written with the same version of a schema). Asumming they're fixed size, we could use strides in the constructor of an ndarray (see https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html). We also may be able to leverage Cython a bit here and use numpy's c-API to be even more efficient.
For non-simple capnproto messages (i.e. ones with nested structs or strings or arrays), we'd have to explore what's possible in the numpy c-API.
If you fit into the simple struct case, let me know and I can probably cook up something simple.
I am glad you like the idea. Here are some internals on how numpy stores data and this.
So in my case I was more thinking of allowing to pass arbitrary array sizes. So that I could remote machine learning algorithms, where algorithms are fixed, but arguments (including array sizes and their data) change. So a message in my case would be some serialization of *args
and **kwargs
. Not sure if capn' proto supports that.
@mitar Do you have any progress in this problem? I have same application to use cap'n proto to transfer trainning messages between learning algorithm and environment. There are existly a message with numpy array, I am not sure whether it's OK to continue using cap'n.
I am using Arrow now for this: https://arrow.apache.org/
thank you for replying.
Numpy arrays can potentially be large and serializing them can be a performance hit. I am looking into ways to efficiently pass numpy arrays between processes and I was wondering that maybe cap'n proto could help here. Probably it could wrap a numpy array in some way, based on the fact that numpy arrays are pretty dense data structures as well (not sure about its sparse representation). I am still trying to understand cap'n proto so any suggestion on how to do that would be welcome.