dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.57k stars 717 forks source link

max number of tasks handled by a dask worker #8491

Open llodds opened 7 months ago

llodds commented 7 months ago

I am using SGECluster to submit thousands of tasks to dask workers. I want to request a feature to specify max number of tasks per worker to improve cluster usage. For example, if it takes 4 hours to process a task, and the wall time limit for a worker is set to 5 hours (to make sure a single task can run through; and if the compute node goes abnormal, it will time out in 5 hours), then with the current dask configuration, each worker will waste 1 hour to run through the second task, and this second task will eventually get killed and resubmit to another worker. This is a waste of the compute cluster resource. So is it possible to specify max number of tasks X handled by each dask worker? Once a dask worker finishes handle X tasks (with whatever final status), then the dask worker (SGE job) will automatically get killed so we won't waste computing resource in the cluster.

Wish for similar feature for SLURMCluster as well. And appreciate for alternative workarounds.

mrocklin commented 7 months ago

My guess is that this is too specific for it to be implemented in Dask directly, but you could easily implement it yourself as a WorkerPlugin.

https://distributed.dask.org/en/stable/plugins.html?#worker-plugins

On Sun, Feb 4, 2024 at 9:34 PM llodds @.***> wrote:

I am using SGECluster to submit thousands of tasks to dask workers. I want to request a feature to specify max number of tasks per worker to improve cluster usage. For example, if it takes 4 hours to process a task, and the wall time limit for a worker is set to 5 hours (to make sure a single task can run through; and if the compute node goes abnormal, it will time out in 5 hours), then with the current dask configuration, each worker will waste 1 hour to run through the second task, and this second task will eventually get killed and resubmit to another worker. This is a waste of the compute cluster resource. So is it possible to specify max number of tasks X handled by each dask worker? Once a dask worker finishes handle X tasks (with whatever final status), then the dask worker (SGE job) will automatically get killed so we won't waste computing resource in the cluster.

Wish for similar feature for SLURMCluster as well. And appreciate for alternative workarounds.

— Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/8491, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTD2DCNJWV6IQGP5B6DYSBHLPAVCNFSM6AAAAABCZNQOBGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTONRQGU3TONI . You are receiving this because you are subscribed to this thread.Message ID: @.***>