dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.57k stars 718 forks source link

Update aggregate statistics for `TaskPrefix` instead of calculating them on demand #8680

Closed hendrikmakait closed 3 months ago

hendrikmakait commented 3 months ago

Problem

As lined out in #8677, having too many task groups can make the on-demand calculation of the statistics prohibitively expensive.

Solution

We have a few places within the TaskPrefix where we loop over all its groups. Refactoring this to eager updates of aggregated values would avoid the problems faced in #8677 altogether, allow us to keep using the TaskProgress dashboard even if we have many task groups, and be a more widely-applicable improvement.