Open LeHenschel opened 1 year ago
I think multi-cpu management is still an open issue.
I thought limiting the cpu availability via singularity (or docker) might actually be the best option, as documented in https://docs.sylabs.io/guides/main/user-guide/cgroups.html
But there it also adds another way to limit the cpu usage -- through systemd-run
, which should be available in ubuntu 22.04 by default (https://docs.sylabs.io/guides/main/user-guide/cgroups.html#applying-resource-limits-with-external-tools, https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html)
I am not really sure whether or not this solves the issue, but it is worth having a look at...
What should also be mentioned here, right now, the default value for --threads
in run_fastsurfer.sh
is 1, which means that both inference and segstats.py
get significantly slower. I am not too sure about N4, it might be N4 is currently also "circumventing" the thread limitation.
Generally, this means that you need to manually specify a reasonable value for --threads
to get close to the 1minute for segmentation target.
Description
Usage of used cpu-threads is not controllable via the --threads environment for FastSurfer segmentation modules. In the FastSurfer surface pipeline, controllability is only given when threads is set to 1.
Overall, also setting the environment variable OMP_NUM_THREADS in run_fastsurfer.sh instead of recon-surf.sh may solve the issue for --threads 1. Other assignments (threads > 1) are, however, not guaranteed to keep the cpu usage to the determined thread number (neither in the segmentation nor the surface module). The issue here is numpys multi-processing:
In it's default state, numpy will use all available threads for all functions compiled against multi-processing compatible C libraries (OpenBLAS, MKL,...). This can cause issues in two ways a.) cpu overload when running in parallel, b.) slowdown of functions for small matrices/operations (unnecessary overhead basically). There is no option to change this in numpy per se (mainly because a catch-all solution for all the different C libraries is difficult: see e.g. https://github.com/numpy/numpy/issues/16990, https://github.com/numpy/numpy/issues/11826).
Short term solution
Set all possible relevant environment variables to a specific value before (!) numpy is imported. This is a simple solution with the drawback that all relevant variables (https://stackoverflow.com/questions/30791550/limit-number-of-threads-in-numpy) have to be known and changed (and the list might change).
Permanent fix
The current recommendation (per this discussion on the numpy github: https://github.com/numpy/numpy/issues/11826) is to use the threadpoolctl package to wrap all relevant functions. This way, user-specified thread variables can actually be used, rather than limiting everything to 1. This would require several changes in Lapy and FastSurfer.