Open arneyjfs opened 3 months ago
Looking at the source the logic for -1
definitely results in nworkers
being set to 256
.
I think given that you confirm that setting --nworkers 256
results in the same thing then the whole negative math thing is a red herring.
In which case there must be some problem with starting that high number of workers on machines with hight CPU core counts.
My machine has 12 cores and behaves as expected with -1
.
I can also start 256
workers quite happily with dask worker --nworkers 256 --nthreads 1
.
I wonder if this is an issue with machines with high numbers of CPU cores. I'll spin up a big cloud VM and try the same thing.
Describe the issue:
The documentation says that if the argument passed to
--nworkers
is negative, then(CPU_COUNT + 1 + nworkers)
is used for the number of processes. I have 2 machines in the cluster, both with the same specs (nproc = 256
) however I do not get 256 workers.Full CPU specs:
The number i actually get seems to fluctuate each time I run the
dask worker
command. One server normally starts with around 210 workers, and the other with around 70 workers, but this changes. The UI therefore reports about 280 workers in total with 1 thread each.Firstly, why the variability? And secondly, how can I maximise this count? The workloads I need to run are simple single process medium length tasks.
Minimal Complete Verifiable Example:
Anything else we need to know?: Things I've ruled out:
--nworkers
explicitly to 256 behaves the same as -1--nthreads
makes no changeEnvironment: