Open ameyc opened 4 months ago
Should we support serialization in substrait?
Maybe extend it to serialize AggregateExec? https://github.com/apache/datafusion/blob/16a3148354e81e1ae4e2aebdd83c07799164ac14/datafusion/substrait/src/physical_plan/producer.rs#L122-L124
The way we solved this was to add a sister method to the state
method (we called it full_state
), which returns the full state instead of just the output-producing part (like how state
does). Then, we just added some capabilities to serialize ArrayRef
s and we achieved serializability for built-in accumulators with very little introduced complexity. AFAICT there is no issue with serializing ScalarValue
s.
This may be something that could live in upstream if there is sufficient interest in it (it doesn't seem to overfit to any particular usage of DF). I will monitor this issue and we may patch this stuff to upstream DF if appropriate.
we'd be very interested in this @ozankabak
Maybe extend it to serialize AggregateExec?
Not super familiar with the subtrait project but would this also serialize the state of the accumulators?
Maybe extend it to serialize AggregateExec?
Not super familiar with the subtrait project but would this also serialize the state of the accumulators?
I guess so. https://substrait.io/
There is also another issue for physical expr serialization https://github.com/apache/datafusion/issues/11350
Is your feature request related to a problem or challenge?
When running continuous computations, we'd like to snapshot the state of our operators many of which use accumulators. This would be key to make computations over continuous streams a first class citizen in DataFusion (see #11365 ).
Describe the solution you'd like
Addition of a
SerializableAccumulator
trait -as well as a method on the
Accumulator
trait -This would mean
ScalarValue
also needs to implement serialization to [u8]. We have a POC PR on our fork of DataFusion for this.Would love to hear feedback from the community on this proposal.
Describe alternatives you've considered
No response
Additional context
No response