eth-cscs / sarus

OCI-compatible engine to deploy Linux containers on HPC environments.
https://sarus.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
129 stars 10 forks source link

Too many cpus requested #14

Closed haampie closed 4 years ago

haampie commented 4 years ago

On my pc with a AMD Ryzen 7 3700X and Linux 5.4.0 I'm facing an issue with the number of requested CPUs being too large which results in the container failing to start.

The generated config.json for runc contains ... "linux":{"resources":{"cpu":{"cpus":"0-31"}} ... which indeed corresponds to the Cpus_allowed_list:

$ cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list:  0-31

but I only have 8 cores / 16 threads:

$ nproc
16

The error I'm getting is

ERRO[0000] container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"failed to write \\\"0-31\\\" to \\\"/sys/fs/cgroup/cpuset/container-ccxounuuahznpsds/cpuset.cpus\\\": write /sys/fs/cgroup/cpuset/container-ccxounuuahznpsds/cpuset.cpus: invalid argument\"" 

If I hard-code cpus to 0-15 everything is fine.

Madeeks commented 4 years ago

Hello @haampie, we are aware that the mechanism for assigning CPU affinity is not functioning properly in some situations. A fix has been merged a couple of days ago in the development branch (https://github.com/eth-cscs/sarus/commit/c98a44eb6c8a2022495a4443a0a41345781a4a42) and will be available in the next tagged release.

haampie commented 4 years ago

Ah, thanks, didn't notice that!