RS-DAT / JupyterDaskOnSLURM

Apache License 2.0
16 stars 3 forks source link

New job for each worker? #19

Open Peter9192 opened 1 year ago

Peter9192 commented 1 year ago

Thanks for the comprehensive guide! I tried to follow your steps (on Snellius), but I'm not entirely sure if I understand it correctly.

When I submit the jobscript, I request one job on a quarter node (32 cores). This launches a jupyterLab session with a dask scheduler (but no workers). Then, I click the scale button to get workers. I tried with 5. When I did this, I noticed in squeue that 5 new jobs were started, each on a separate node, presumably following the configuration in the dask config file.

If I'm not mistaken, that means I've now requested (and am being charged for) 6 * 32 cores. Especially the scheduler seems wasteful. Wouldn't it be much more efficient if the scheduler used one core, and the remaining 31 cores and memory allocation of my original jobscript would be destined for the workers?

fnattino commented 1 year ago

Hi @Peter9192, thanks for testing and for pointing this out - what you describe is exactly what's happening.

Part of the issue is probably related to the configuration of Snellius, which does not allow for smaller jobs: Spider does not have this constraint, so one can start a small job (one or two cores) to run the Jupyter server and the Dask scheduler and configure larger jobs for the workers. But I guess this is due to Snellius targeting real HPC problems, while Spider being more for high-throughput.

Anyway, right now Dask Jobqueue, which is what we use to setup the Dask cluster on SLURM, does not seem to support the possibility of creating an additional local worker to occupy the resources on the node where Jupyter and the Dask scheduler are running. You could still start a worker to occupy these resources "manually", e.g. by running something like the following shell command in a separate terminal window within JupyterLab (you could get <SCHEDULER_ADDRESS> from client.scheduler.address):

/home/$USER/mambaforge/envs/jupyter_dask/bin/python -m distributed.cli.dask_worker <SCHEDULER_ADDRESS> --nthreads 32 --memory-limit 56GiB --name LocalWorker --nanny --death-timeout 600 --local-directory $TMPDIR

but I agree this is less than optimal..

I have opened an issue here, let's see whether there is interest in solving this within Dask Jobqueue: https://github.com/dask/dask-jobqueue/issues/596