Open vissible opened 11 months ago
I also reproduce the issue on Ubuntu 20.04, here is the details:
# ps -o pid,comm,cgroup 2206961
PID COMMAND CGROUP
2206961 app 12:app:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc3a94178_40e0_4739_bcad_4ee8ec576a0f.slice/cri-containerd-c70324c0804812a23c105488e0cb302d3611089c2b75a7b98
# ls /sys/fs/cgroup/devices//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc3a94178_40e0_4739_bcad_4ee8ec576a0f.slice/cri-containerd-c70324c0804812a23c105488e0cb302d3611089c2b75a7b98ff7d094ca9c9e3d.scope/
cgroup.clone_children cgroup.procs devices.allow devices.deny devices.list notify_on_release tasks
# cat /sys/fs/cgroup/devices//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc3a94178_40e0_4739_bcad_4ee8ec576a0f.slice/cri-containerd-c70324c0804812a23c105488e0cb302d3611089c2b75a7b98ff7d094ca9c9e3d.scope/devices.list
c 136:* rwm
c 10:200 rwm
c 5:2 rwm
c 5:0 rwm
c 1:9 rwm
c 1:8 rwm
c 1:7 rwm
c 1:5 rwm
c 1:3 rwm
c *:* m
b *:* m
c 195:255 rw
c 508:0 rw
c 508:1 rw
c 195:7 rw
# systemctl daemon-reload
# cat /sys/fs/cgroup/devices//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc3a94178_40e0_4739_bcad_4ee8ec576a0f.slice/cri-containerd-c70324c0804812a23c105488e0cb302d3611089c2b75a7b98ff7d094ca9c9e3d.scope/devices.list
b *:* m
c *:* m
c 1:3 rwm
c 1:5 rwm
c 1:7 rwm
c 1:8 rwm
c 1:9 rwm
c 5:0 rwm
c 5:2 rwm
c 10:200 rwm
c 136:* rwm
I think I have exactly same issue. nvidia-smi
doesn't work because it tries to access /dev/nvidiactl
and (due to two EBPF programs installed) it cannot open it. Worst of all it's not on every boot, so there's a race condition somewhere
will fix it by manually adding all the devices to a container
--device=/dev/nvidia{-uvm-tools,-uvm,ctl} $( for DEVICE in $( find /dev/ -type c -name 'nvidia?' ); do echo --device=$DEVICE; done ) )
Hi,
I met an issue which is similar to #48, but all the symlinks files exist under /dev/char. I'm not sure if this is related to systemd, runc or nvidia-container-toolki, but as it's very similar to #48, so I opening the issue here.
My environment: ubuntu 22.04 kernel 5.15.0-88-generic rke2 v1.26.10+rke2r2 with cilium nvidia-device-plugin-0.14.2 nvidia driver 535.129.03 nvidia-container-toolkit 1.14.3
At the beginning,
Then run systemctl daemon-reload,
Run systemctl daemon-reload again,
Try to unload the wrong ebpf prog,
After unloaded the wrong ebpf prog, everything become normal again.