Open fjetter opened 2 years ago
There is another double/multiple counting problem in _set_duration_estimate
that concerns tasks with shared dependencies.
_set_duration_estimate
is evaluated once per task w/out any regard of shared dependencies. Therefore, specifically for graphs where N tasks share one common node, this nodes transfer cost is vastly overestimated since it is counted N times.
This double counting can be catastrophic for cases where transfer cost is potentially larger or of similar size than compute. Apart from an erroneous worker_objective, this can lead to misclassification of idle workers which then causes very aggressive work stealing where all tasks are stolen by the worker with the dependency. An extreme example is https://github.com/dask/distributed/issues/6573
This double counting appears to go back to https://github.com/dask/distributed/pull/773
We're double counting estimated network cost in multiple places
Forst, we're calculating the estimated network cost of dependencies a worker needs to fetch in
_set_duration_estimate
and are setting the result toWorkerState.processing
, i.e.processing = compute + comm
This is also used to set the workers occupancyWhen making a scheduling decision, we're typically using
Scheduler.worker_objective
which calculates astart_time
that is defined ashttps://github.com/dask/distributed/blob/b133009cee88fd48c8a345cffde0a8e9163426a6/distributed/scheduler.py#L3000-L3001
i.e.
A similar double counting is introduced on work stealing side when calculating the cost_multiplier
i.e. for network heavy tasks, this converges towards 1 which is quite the opposite of what this ratio is supposed to encode