Closed simonbyrne closed 2 years ago
The fallbacks were already intended to address cgroupv1. What is wrong with those?
It isn't giving the correct result on our Slurm cluster. If I do:
├ srun -t 01:10:00 --mem 2G --pty bash -l # start job with 2G of memory
├ cat /sys/fs/cgroup/memory/memory.limit_in_bytes # what is read by uv__get_constrained_memory_fallback
9223372036854771712
├ cat /proc/self/cgroup # cgroups of current process
11:blkio:/
10:memory:/slurm/uid_5184/job_28925944/step_0
9:freezer:/slurm/uid_5184/job_28925944/step_0
8:pids:/
7:cpuset:/slurm/uid_5184/job_28925944/step_0
6:net_prio,net_cls:/
5:perf_event:/
4:hugetlb:/
3:devices:/slurm/uid_5184/job_28925944/step_0/task_0
2:cpuacct,cpu:/
1:name=systemd:/system.slice/slurmd.service
├ cat /sys/fs/cgroup/memory/slurm/uid_5184/job_28925944/step_0/memory.limit_in_bytes # current memory controller cgroup
2147483648
Any thoughts on this? I'd really like to get this into 1.9 (or ideally backported to 1.8).
Actually, I'll try to upstream it, since I'm working on that right now anyway.
Actually, I'll try to upstream it, since I'm working on that right now anyway.
Thanks, please let me know if I can do anything to help.
I updated your PR to still correctly read out the limits I'm seeing (ref https://github.com/JuliaLang/libuv/pull/27#discussion_r993128302) and pushed that to https://github.com/libuv/libuv/pull/3754. Can you verify that works with your SLURM system?
Superseded by https://github.com/JuliaLang/libuv/pull/32
Seems to fix issue with https://github.com/JuliaLang/julia/pull/46796#issuecomment-1272117544
I admit I know very little about cgroups, or C programming, so this may well be incorrect, but it was able to correctly determine the cgroup limits on our Slurm cluster