Open itzsimpl opened 10 months ago
I'm surprised that nproc
has this behavior, to be honest.
I'll review the PR, but it's a little bit of a sensitive topic: setting the wrong number of threads can quickly cause performance issues one way or another (not enough cores in use VS too many threads). I'll check with my colleagues what they think.
@flx42 I agree that surprised me too. For completeness I'm referencing the issue (a cpu oversubscription https://github.com/NVIDIA/NeMo/issues/8141) that led me into this investigation. It turned out there is an issue in numba (https://github.com/numba/numba/issues/9387), which resets the value of torch num_threads on numba num_threads get or set.
My proposal is to keep the behaviour consistent. Especially since torch proposes setting num_threads to nCPU/nTasks, as well as because nproc
is updated too (and some bash scripts are based on that value). Do check with colleagues, please.
FWW. nproc
does take into account both OMP_NUM_THREADS
and OMP_THREAD_LIMIT
https://www.gnu.org/software/coreutils/manual/html_node/nproc-invocation.html
When running a PyTorch container in slurm with cpus-per-task set,
nproc
reports a wrong value (1).and
but
This is caused by 50-slurm-pytorch.sh hook, which hardcodes
OMP_NUM_THREADS
to 1; I have opened a PR (https://github.com/NVIDIA/enroot/pull/174) with a fix that is based on current Pytorch Multiprocessing best practices.