Open gz opened 1 year ago
Oh, that's interesting because it's going the wrong direction. Instead of input->state->output
, you're talking about output->state
. It is obviously not possible in general, since of course information is usually lost when producing output. I don't know whether there is an interesting class of applications where the state can be derived from the output.
Another possibility that has occurred to me is that there are applications where the output only depends on fairly recent input, so that it could be cheaper to reproduce the state, by replaying starting some fixed amount of time back in the input, than to save the state. This risks incorrect output, though, if we are wrong about the output only depending on recent input. (Replaying recent input seems like a disaster-recovery response to me, if the state database is lost.)
https://github.com/feldera/dist-design/blob/85291465e4b4b6c6b2e528df89ef848de3f9199e/README.md?plain=1#L109
I was wondering if there are classes of pipeline applicaitons where
State inside the circuit across all the workers.
is cheaper to reproduce from theOutput produced by the circuit in a previous step but not yet acknowledged by its destination.
instead of storing it as persistent state. (e.g., what i have in mind is something like group-by aggregates for dashboards etc.) and if that's something to consider (maybe/probably not).