apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.96k stars 3.41k forks source link

[C++][Compute] Support sharing KernelState between processes #26586

Open asfimport opened 3 years ago

asfimport commented 3 years ago

From the mailing list discussion: https://lists.apache.org/thread.html/ra2a8f9fc5cf621bd79a6db4e578ad133dd921722f24e4c220e4ba07f%40%3Cdev.arrow.apache.org%3E

In a distributed context, it would be useful to send a KernelState from one process to another. For example when distributing computation of mean, the completed states of each process must be merged before finalization. This could be supported by adding to the KernelState interface (KernelState::{Serialize, Deserialize} or by adding functions to ScalarAggregateKernel.

Reporter: Ben Kietzman / @bkietz

Note: This issue was originally created as ARROW-10630. Please see the migration documentation for further details.

asfimport commented 3 years ago

Ben Kietzman / @bkietz: @wesm

asfimport commented 3 years ago

Wes McKinney / @wesm: API design aside, this is a standard problem that must be solved to create distributed query engines. I suggest building either a Flatbuffers or Protocol Buffers-based serialization protocol that can be used to serialize different objects in the arrow::compute namespace.