alicevision / AliceVision

Photogrammetric Computer Vision Framework
http://alicevision.org
Other
2.89k stars 807 forks source link

Find maximum number of CPUs given within a cgroup #1646

Open tomgreen66 opened 5 months ago

tomgreen66 commented 5 months ago

On a HPC cluster environment, CPUs can be given to it via job schedulers, such as Slurm, using cgroups to provide a cpuset to a job. Currently Alicevision (and hence Meshroom) will report the maximum number of processors on the compute node, whilst the job scheduler may have own given access to a limited set of CPUs. Therefore the user has to remember to limit the number of CPUs to what Slurm has given it.

Would it be worth changing the code for get_total_cpus at:

https://github.com/alicevision/AliceVision/blob/3a0be0fef01d19a0bb1cb5054f23be6591de0301/src/aliceVision/system/cpu.cpp#L133-L143

to be instead use:

#ifndef _GNU_SOURCE
# define _GNU_SOURCE
#endif
#include <sched.h>
int get_total_cpus()
{
 cpu_set_t cs;
 sched_getaffinity(0, sizeof(cs), &cs);
 return CPU_COUNT_S(sizeof(cs), &cs);
}

This should return the actual CPUs which are available to the software rather than the total maximum on the node. This may need updates to Cmake to test for existence of sched_getaffinity so fallback to current method can be used, something like:

list(APPEND CMAKE_REQUIRED_DEFINITIONS -D_GNU_SOURCE)
CHECK_SYMBOL_EXISTS(sched_getaffinity sched.h HAVE_SCHED_GETAFFINITY)
list(REMOVE_ITEM CMAKE_REQUIRED_DEFINITIONS -D_GNU_SOURCE)