check memory limits for cgroupv1

JuliaLang / libuv

Cross-platform asynchronous I/O

http://libuv.org/

MIT License

9 stars 14 forks source link

check memory limits for cgroupv1 #27

Closed simonbyrne closed 2 years ago

simonbyrne commented 2 years ago

Seems to fix issue with https://github.com/JuliaLang/julia/pull/46796#issuecomment-1272117544

I admit I know very little about cgroups, or C programming, so this may well be incorrect, but it was able to correctly determine the cgroup limits on our Slurm cluster

maleadt commented 2 years ago

The fallbacks were already intended to address cgroupv1. What is wrong with those?

simonbyrne commented 2 years ago

It isn't giving the correct result on our Slurm cluster. If I do:

├ srun -t 01:10:00 --mem 2G --pty bash -l # start job with 2G of memory

├ cat /sys/fs/cgroup/memory/memory.limit_in_bytes # what is read by uv__get_constrained_memory_fallback
9223372036854771712

├ cat /proc/self/cgroup # cgroups of current process
11:blkio:/
10:memory:/slurm/uid_5184/job_28925944/step_0
9:freezer:/slurm/uid_5184/job_28925944/step_0
8:pids:/
7:cpuset:/slurm/uid_5184/job_28925944/step_0
6:net_prio,net_cls:/
5:perf_event:/
4:hugetlb:/
3:devices:/slurm/uid_5184/job_28925944/step_0/task_0
2:cpuacct,cpu:/
1:name=systemd:/system.slice/slurmd.service

├ cat /sys/fs/cgroup/memory/slurm/uid_5184/job_28925944/step_0/memory.limit_in_bytes # current memory controller cgroup
2147483648

simonbyrne commented 2 years ago

Any thoughts on this? I'd really like to get this into 1.9 (or ideally backported to 1.8).

maleadt commented 2 years ago

Actually, I'll try to upstream it, since I'm working on that right now anyway.

simonbyrne commented 2 years ago

Actually, I'll try to upstream it, since I'm working on that right now anyway.

Thanks, please let me know if I can do anything to help.

maleadt commented 2 years ago

I updated your PR to still correctly read out the limits I'm seeing (ref https://github.com/JuliaLang/libuv/pull/27#discussion_r993128302) and pushed that to https://github.com/libuv/libuv/pull/3754. Can you verify that works with your SLURM system?

maleadt commented 2 years ago

Superseded by https://github.com/JuliaLang/libuv/pull/32