CSB5 / lofreq

LoFreq Star: Sensitive variant calling from sequencing data
http://csb5.github.io/lofreq/
Other
100 stars 31 forks source link

cpus count versus available cpus #132

Open EricDeveaud opened 2 years ago

EricDeveaud commented 2 years ago

Hello,

lofreq2_call_pparallel.pyuses multiprocessing.cpu_count() to ge the number of cpus. multiprocessing.cpu_count() return the number of cpu in the machine, But this is not the same as the number of cpu availabl" to the process. For example, you can run in a taskset context or a batch scheduler like slurm.

see:

$ nproc
96
$ taskset -c 1 nproc
1
$ taskset -c 1 python3 -c "import multiprocessing; print(multiprocessing.cpu_count())"
96

I would suggest to use len(os.sched_getaffinity(0)) instead of multiprocessing.cpu_count()

$ python3 -c "import os; print(len(os.sched_getaffinity(0)))"
96
$ taskset -c 1 python3 -c "import os; print(len(os.sched_getaffinity(0)))"
1

NB Mac OSX python does not have os.sched_getaffinity so a portable way to code it would be

try:
    num_cpus = len(os.sched_getaffinity(0))
except AttributeError:
    num_cpus = multiprocessing.cpu_count()

thus lofreq2_call_pparallel.py may launch more parallel jobs via multiprocessing.Pool than available cores. each one competing with the others on the same core.

regards

Eric