TimelyDataflow / timely-dataflow

A modular implementation of timely dataflow in Rust
MIT License
3.29k stars 272 forks source link

Activator: Bound memory utilization by compaction and de-duplication #470

Open antiguru opened 2 years ago

antiguru commented 2 years ago

The Activator object allows one to force the scheduling of Timely operators even in the absence of progress changes. Care must be taken to avoid scheduling the operators too often, as all scheduled activations are stored in memory until the scheduler picks them up.

To avoid this, we propose the following changes when activating:

In Materialize, we often have a pattern where we activate an upstream operator once a downstream dataflow operator is dropped, or a source has new data and Timely needs to schedule the source operator. To avoid the issue of sending too many activations, we use a pattern where the operator only gets activated once, and once it is running, it'll need to acknowledge the activation, which enables future activations. This pattern works well, but comes with additional complexity for a developer writing Timely operators. The above solution would eliminate the extra burden, at the cost of some (amortized) overhead. For specific operators, the activate-acknowledge pattern might still be useful.