dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 719 forks source link

Option to turn off `reevaluate_occupancy` #4523

Open jakirkham opened 3 years ago

jakirkham commented 3 years ago

During the call earlier, we discussed turning off reevaluate_occupancy and seeing how things ran. TBH I'm not actually sure how to do this. Looking at the code in a few placed, it seems like reevaluate_occupancy gets added to the IOLoop (see below), but there doesn't appear to be a clear way to influence whether it is called either by disabling or delaying the frequency of the call. Is there one and I'm just overlooking things? Or should we add one? Related we can disable stealing, which we've played with before, but this only affects things after going through a fair bit of reevaluate_occupancy. So not sure if this is what we had in mind. Thoughts? 🙂

https://github.com/dask/distributed/blob/383ea0326ae103b5d5e0b62ed9c3cb18510c5b9e/distributed/scheduler.py#L3273

https://github.com/dask/distributed/blob/383ea0326ae103b5d5e0b62ed9c3cb18510c5b9e/distributed/scheduler.py#L6490-L6492

cc @mrocklin @quasiben

jakirkham commented 3 years ago

Or perhaps another way to look at this would be what is the goal of reevaluate_occupancy? Is it just to identify tasks that would be good candidates for stealing? If so, should it not be run when stealing is disabled? At least currently that doesn't appear to be how the code behaves, but we could change this

mrocklin commented 3 years ago

TBH I'm not actually sure how to do this. Looking at the code in a few placed, it seems like reevaluate_occupancy gets added to the IOLoop (see below), but there doesn't appear to be a clear way to influence whether it is called either by disabling or delaying the frequency of the call

Yeah, the easy way would be to remove the first line of code that you link to. The other way would be to make the next_time timedelta a configuration value. Currently it is at 0.1s. You would probably make a config value for this and then set it to 1 hour or something if you wanted to turn if "off".

Or perhaps another way to look at this would be what is the goal of reevaluate_occupancy?

At the risk of sounding pedantic, I'm going to copy the docstring here.

Periodically reassess task duration time

    The expected duration of a task can change over time.  Unfortunately we
    don't have a good constant-time way to propagate the effects of these
    changes out to the summaries that they affect, like the total expected
    runtime of each of the workers, or what tasks are stealable.

    In this coroutine we walk through all of the workers and re-align their
    estimates with the current state of tasks.  We do this periodically
    rather than at every transition, and we only do it if the scheduler
    process isn't under load (using psutil.Process.cpu_percent()).  This
    lets us avoid this fringe optimization when we have better things to
    think about.

So yes, it's useful for stealing, but also any other time that we use a worker's occupancy .

jakirkham commented 3 years ago

Sure just thinking about what the right bits would be for turning this off (not attached to how we do that). Sounds like making the time configurable is the thing to do. So let's do that. Thanks for the feedback here 🙂