ExpandingMan / Arrow.jl

DEPRECATED in favor of [JuliaData/Arrow.jl](https://github.com/JuliaData/Arrow.jl)
Other
56 stars 9 forks source link

actions required for IPC use cases? #2

Open davidanthoff opened 6 years ago

davidanthoff commented 6 years ago

I know you are just starting on the writing case, but here is one thing I've been thinking about for a while: it would be really, really neat if this could be used for inter-process communication between julia and nodejs processes. In particular, I have a whole bunch of scenarios where I have tabular data on the julia side that I want to then process in a nodejs process with javascript. There is a full typescript implementation of arrow, so I kind of have this vision where it might be feasible to either send an arrow thingy as messages via a pipe over into javascript land, or maybe even via some shared memory region. I'm quite hazy about the details of that, but thought I'd bring it up at this point.

ExpandingMan commented 6 years ago

I definitely hope to be able to do things like that. As the design is currently, all you really need is a pointer to a buffer containing binary data in Arrow format. Except in the simplest cases, there would be a little bit of overhead to making Julia arrays readable in Arrow format. In particular, Julia does typically store nullable arrays in the Arrow format (although one could of course create a Julia object that does so). Of course the simplest way around this would be to copy data from Julia arrays into an array in the arrow format, but obviously the copying is quite undesirable. The only alternative I could see would be to create objects that e.g. contain both references to the Julia array you want to send and a new buffer containing Arrow formatted metadata such as nulls. Fortunately, as Arrow.jl is right now, one needn't necessarily store all parts of an Arrow array in only one particular Julia object, so I don't think this case would require any rethinking of what I've already written.

The one thing that I am extremely confused about right now is whether there is any sort of metadata standard format that is part of Arrow and not a particular implementation such as Feather. I think the answer is basically no, although there seems to be some standard associated with sending messages like you described.

In summary, I don't think I understand everything that would be involved in sending inter-process messages using Arrow, but I'm reasonably sure we are on track to eventually be able to do that. Once I finish the basic implementation I can worry about how to describe Julia objects in an arrow format.

(I'll try to write the README tomorrow, if you become concerned that anything I'm doing would be incompatible with this use case, please let me know.)