TimelyDataflow / timely-dataflow

A modular implementation of timely dataflow in Rust
MIT License
3.28k stars 271 forks source link

Tracking progress towards completion. #340

Open ryzhyk opened 3 years ago

ryzhyk commented 3 years ago

Is there a way to estimate "progress towards completion" in the context of either timely or differential? Let's say we pushed some updates to DD, advanced the timestamp and are now calling step_or_park in a loop waiting for the probe to report that all updates have successfully propagated through the pipeline. This can take a long time, e.g., when populating the pipeline with initial data or when processing a small update that triggers large amount of recomputation. In these situations it may be nice to comfort the user with a little progress bar showing approximately how much longer they have to wait.

@frankmcsherry, do you happen to have some magic up your sleeve that might help here? :)

@Kixiron, @RDambrosio016.

frankmcsherry commented 3 years ago

There is a lingering PR #321 that I need to clean up before merging, which would expose progress information in a logging channel. Depending on what makes it in, it could report either the outstanding capabilities in the system (messages counts on channels, and retained capabilities in operators) or the frontier of outstanding capabilities ("what's holding up the system").

I think this is the most obvious form of progress to report; there is also scheduling information about how many times and for how long various operators have been executed, but timely doesn't know too much about whether they are making progress when invoked.

Operator can also use their own custom logging if there is a clearer notion of progress that the operators can invoke (perhaps they have internal queues they are working to drain; timely wouldn't know about them, but the operators can still hook into the logging and report them).