dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 718 forks source link

Beyond `Scheduler.is_rootish` #8934

Open fjetter opened 1 week ago

fjetter commented 1 week ago

The scheduler currently relies on a crude heuristic to infer topologies that may suggest that certain tasks are "root-ish". If the tasks are detected as such, they are "queued" to avoid memory pressure on the workers due to various scheduling artifacts.

There are a couple of known problems with this

Instead of relying on this heuristic to control queuing I suggest to leverage the newly introduced task spec classes.

I propose to add an attribute to the Task class that is flagging whether or not a task is considered rootish (Instead of root-ish we may actually want to flag the task as IO but this is still TBD)

This flag would work very similar to how annotations worked before with the important difference that we wouldn't loose any annotations when moving from HLG to Low level graph representation. Therefore, to make this work we'd also have to reimplement low level fusion code (which from all we know is still very important for array like workloads, see also https://github.com/dask/dask/issues/11458)

hendrikmakait commented 1 week ago

Note that there is a difference between tasks being rootish and tasks getting not getting queued. In https://github.com/dask/distributed/pull/8873, we had to move from flagging a task as non-rootish to being non-queuable due to the difference in scheduling behavior for rootish and non-rootish tasks while queueing is disabled. This may not be relevant but should be taken into account when replacing Scheduler.is_rootish.