Parsl / parsl

Parsl - a Python parallel scripting library
http://parsl-project.org
Apache License 2.0
498 stars 195 forks source link

Smarter site selection methods #492

Open yadudoc opened 6 years ago

yadudoc commented 6 years ago

Out site selection strategy is pick an executor at random from the executors lists. We need to support smarter strategies like, send tasks to all executors, attempt to scale and once a site has found workers, cancel pending tasks on the other.

This is a requirement from the Xenon folks (@bridel, @ershockley).

yadudoc commented 5 years ago

A concrete use case would be to prefer executors with available slots when sending out tasks. One shortcoming here would be that our strategy mechanism looks at tasks pending at an executor to decide whether more blocks should be provisioned, so if we were to only send tasks to sites with available capacity we could easily end up in a deadlock.

If we were to do late-binding, ie put runnable tasks is a ready_for_launch state and submit tasks to executors only when capacity is available we'll need a few DFK enhancements. We'll need means for the strategy/provider to update the DFK when a new block comes online so as to trigger task launch. And we'll need to rethink how the strategy decides to scale up since we currently depend on task backlog on the executor to decide that.

Another way to handle this would be allow for strategy to steal/redistribute tasks between executors, which also would be new work. But this will only require new logic in strategy and support in executors to be added. One advantage would be that while this would be triggered at the strategies frequency (1s-30s) rebalancing is only ever needed for long running tasks and should be more tolerable in that situation. This functionality would again be useful for when we allow tasks to call apps.

The TBI project could be early users for this.