dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 718 forks source link

Sort dashboard progress bars in topological order? #6983

Open gjoseph92 opened 2 years ago

gjoseph92 commented 2 years ago

I believe the progress bars on the dashboard are currently sorted by group size (largest first):

https://github.com/dask/distributed/blob/bfc5cfea80450954dba5b87a5858cb2e3bac1833/distributed/diagnostics/progress_stream.py#L94

This is a cheap metric that probably sometimes approximates topological order. Of course, it's wrong for any fan-out operations (repartition, shuffle, etc.).

But progress might be easier to watch and decipher if it was in actual topological order. Bars would then be most full at the top, and least full at the bottom.

It took me a long time of using dask to actually understand what the progress bars were showing, I think because it felt so random which ones were completing first.

It seems doable to maintain topological ordering, but might require some state in between updates to do efficiently.

Current

image

In the above example, make-timeseries comes first in topological order, then the repartitions, then sub, then dataframe-count and dataframe-sum

Proposed

progress

fjetter commented 2 years ago

+1 for the UX

I'm wondering how to best implement this. Maybe task groups should/could/must track min/max priorities of contained tasks? Maybe min/max priorities per Computation to not have mixed up state.