Open jakirkham opened 3 years ago
Or perhaps another way to look at this would be what is the goal of reevaluate_occupancy
? Is it just to identify tasks that would be good candidates for stealing? If so, should it not be run when stealing is disabled? At least currently that doesn't appear to be how the code behaves, but we could change this
TBH I'm not actually sure how to do this. Looking at the code in a few placed, it seems like reevaluate_occupancy gets added to the IOLoop (see below), but there doesn't appear to be a clear way to influence whether it is called either by disabling or delaying the frequency of the call
Yeah, the easy way would be to remove the first line of code that you link to. The other way would be to make the next_time
timedelta a configuration value. Currently it is at 0.1s. You would probably make a config value for this and then set it to 1 hour
or something if you wanted to turn if "off".
Or perhaps another way to look at this would be what is the goal of reevaluate_occupancy?
At the risk of sounding pedantic, I'm going to copy the docstring here.
Periodically reassess task duration time
The expected duration of a task can change over time. Unfortunately we
don't have a good constant-time way to propagate the effects of these
changes out to the summaries that they affect, like the total expected
runtime of each of the workers, or what tasks are stealable.
In this coroutine we walk through all of the workers and re-align their
estimates with the current state of tasks. We do this periodically
rather than at every transition, and we only do it if the scheduler
process isn't under load (using psutil.Process.cpu_percent()). This
lets us avoid this fringe optimization when we have better things to
think about.
So yes, it's useful for stealing, but also any other time that we use a worker's occupancy .
Sure just thinking about what the right bits would be for turning this off (not attached to how we do that). Sounds like making the time configurable is the thing to do. So let's do that. Thanks for the feedback here 🙂
During the call earlier, we discussed turning off
reevaluate_occupancy
and seeing how things ran. TBH I'm not actually sure how to do this. Looking at the code in a few placed, it seems likereevaluate_occupancy
gets added to theIOLoop
(see below), but there doesn't appear to be a clear way to influence whether it is called either by disabling or delaying the frequency of the call. Is there one and I'm just overlooking things? Or should we add one? Related we can disable stealing, which we've played with before, but this only affects things after going through a fair bit ofreevaluate_occupancy
. So not sure if this is what we had in mind. Thoughts? 🙂https://github.com/dask/distributed/blob/383ea0326ae103b5d5e0b62ed9c3cb18510c5b9e/distributed/scheduler.py#L3273
https://github.com/dask/distributed/blob/383ea0326ae103b5d5e0b62ed9c3cb18510c5b9e/distributed/scheduler.py#L6490-L6492
cc @mrocklin @quasiben