NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
2.51k stars 271 forks source link

Similar issue like issue #48, but with wrong cgroup_device ebpf program #192

Open vissible opened 11 months ago

vissible commented 11 months ago

Hi,

I met an issue which is similar to #48, but all the symlinks files exist under /dev/char. I'm not sure if this is related to systemd, runc or nvidia-container-toolki, but as it's very similar to #48, so I opening the issue here.

My environment: ubuntu 22.04 kernel 5.15.0-88-generic rke2 v1.26.10+rke2r2 with cilium nvidia-device-plugin-0.14.2 nvidia driver 535.129.03 nvidia-container-toolkit 1.14.3

At the beginning,

### on localhost ###
# kubectl exec  app-5d77f99955-nknpt  -- nvidia-smi -L 
GPU 0: NVIDIA A100-SXM4-80GB (UUID: GPU-5a703cdc-c4cc-4a16-0559-88232312123)

###  on worker node ###
# ll /dev/char/195\:*
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:0 -> ../nvidia0
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:1 -> ../nvidia1
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:2 -> ../nvidia2
lrwxrwxrwx 1 root root 17 Nov 13 16:12 /dev/char/195:254 -> ../nvidia-modeset
lrwxrwxrwx 1 root root 12 Dec 13 15:06 /dev/char/195:255 -> ../nvidiactl
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:3 -> ../nvidia3
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:4 -> ../nvidia4
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:5 -> ../nvidia5
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:6 -> ../nvidia6
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:7 -> ../nvidia7
# ps -ef|grep app
root      1265455       1  0 Nov27 ?        00:00:05 app
# ps -o pid,comm,cgroup 1265455
    PID COMMAND         CGROUP
1265455 app            0::/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-poddcf4735c_2ae9_4e49_9597_46647839e3fb.slice/cri-containerd-a88bad885cc806f01ed0011ce0f99d3b4c10f97dd40f0c20671f78
# bpftool cgroup list /sys/fs/cgroup//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-poddcf4735c_2ae9_4e49_9597_46647839e3fb.slice/cri-containerd-a88bad885cc806f01ed0011ce0f99d3b4c10f97dd40f0c20671f782d23427cc4.scope/
ID       AttachType      AttachFlags     Name
13129    device          multi

Then run systemctl daemon-reload,

### on worker node ###
# systemctl daemon-reload
# ll /dev/char/195\:*
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:0 -> ../nvidia0
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:1 -> ../nvidia1
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:2 -> ../nvidia2
lrwxrwxrwx 1 root root 17 Nov 13 16:12 /dev/char/195:254 -> ../nvidia-modeset
lrwxrwxrwx 1 root root 12 Dec 13 15:06 /dev/char/195:255 -> ../nvidiactl
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:3 -> ../nvidia3
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:4 -> ../nvidia4
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:5 -> ../nvidia5
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:6 -> ../nvidia6
lrwxrwxrwx 1 root root 10 Dec 13 15:06 /dev/char/195:7 -> ../nvidia7
# bpftool cgroup list /sys/fs/cgroup//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-poddcf4735c_2ae9_4e49_9597_46647839e3fb.slice/cri-containerd-a88bad885cc806f01ed0011ce0f99d3b4c10f97dd40f0c20671f782d23427cc4.scope/
ID       AttachType      AttachFlags     Name
13129    device          multi
13237    device          multi
# bpftool prog show id 13237
13237: cgroup_device  tag 22d263640f2fec01  gpl
        loaded_at 2023-12-13T13:12:22+0800  uid 0
        xlated 1336B  jited 1029B  memlock 4096B
# bpftool prog list |grep 22d263640f2fec01
13209: cgroup_device  tag 22d263640f2fec01  gpl
13237: cgroup_device  tag 22d263640f2fec01  gpl
13241: cgroup_device  tag 22d263640f2fec01  gpl
13255: cgroup_device  tag 22d263640f2fec01  gpl

### on localhost ###
# kubectl exec  app-5d77f99955-nknpt -- nvidia-smi -L  
Failed to initialize NVML: Unknown Error
command terminated with exit code 255

Run systemctl daemon-reload again,

# systemctl daemon-reload
# bpftool prog list |grep 22d263640f2fec01
13274: cgroup_device  tag 22d263640f2fec01  gpl
13302: cgroup_device  tag 22d263640f2fec01  gpl
13306: cgroup_device  tag 22d263640f2fec01  gpl
13320: cgroup_device  tag 22d263640f2fec01  gpl
# bpftool cgroup list /sys/fs/cgroup//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-poddcf4735c_2ae9_4e49_9597_46647839e3fb.slice/cri-containerd-a88bad885cc806f01ed0011ce0f99d3b4c10f97dd40f0c20671f782d23427cc4.scope/
ID       AttachType      AttachFlags     Name
13129    device          multi
13302    device          multi

Try to unload the wrong ebpf prog,

# bpftool cgroup detach /sys/fs/cgroup//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-poddcf4735c_2ae9_4e49_9597_46647839e3fb.slice/cri-containerd-a88bad885cc806f01ed0011ce0f99d3b4c10f97dd40f0c20671f782d23427cc4.scope/ device id 13302
# bpftool cgroup list /sys/fs/cgroup//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-poddcf4735c_2ae9_4e49_9597_46647839e3fb.slice/cri-containerd-a88bad885cc806f01ed0011ce0f99d3b4c10f97dd40f0c20671f782d23427cc4.scope/
ID       AttachType      AttachFlags     Name
13129    device          multi

After unloaded the wrong ebpf prog, everything become normal again.

# kubectl exec  app-5d77f99955-nknpt  -- nvidia-smi -L 
GPU 0: NVIDIA A100-SXM4-80GB (UUID: GPU-5a703cdc-c4cc-4a16-0559-88232312123)
vissible commented 11 months ago

I also reproduce the issue on Ubuntu 20.04, here is the details:

# ps -o pid,comm,cgroup 2206961
    PID COMMAND         CGROUP
2206961 app            12:app:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc3a94178_40e0_4739_bcad_4ee8ec576a0f.slice/cri-containerd-c70324c0804812a23c105488e0cb302d3611089c2b75a7b98
# ls /sys/fs/cgroup/devices//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc3a94178_40e0_4739_bcad_4ee8ec576a0f.slice/cri-containerd-c70324c0804812a23c105488e0cb302d3611089c2b75a7b98ff7d094ca9c9e3d.scope/
cgroup.clone_children  cgroup.procs           devices.allow          devices.deny           devices.list           notify_on_release      tasks
# cat  /sys/fs/cgroup/devices//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc3a94178_40e0_4739_bcad_4ee8ec576a0f.slice/cri-containerd-c70324c0804812a23c105488e0cb302d3611089c2b75a7b98ff7d094ca9c9e3d.scope/devices.list
c 136:* rwm
c 10:200 rwm
c 5:2 rwm
c 5:0 rwm
c 1:9 rwm
c 1:8 rwm
c 1:7 rwm
c 1:5 rwm
c 1:3 rwm
c *:* m
b *:* m
c 195:255 rw
c 508:0 rw
c 508:1 rw
c 195:7 rw
# systemctl daemon-reload
# cat  /sys/fs/cgroup/devices//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc3a94178_40e0_4739_bcad_4ee8ec576a0f.slice/cri-containerd-c70324c0804812a23c105488e0cb302d3611089c2b75a7b98ff7d094ca9c9e3d.scope/devices.list
b *:* m
c *:* m
c 1:3 rwm
c 1:5 rwm
c 1:7 rwm
c 1:8 rwm
c 1:9 rwm
c 5:0 rwm
c 5:2 rwm
c 10:200 rwm
c 136:* rwm
tryauuum commented 7 months ago

I think I have exactly same issue. nvidia-smi doesn't work because it tries to access /dev/nvidiactl and (due to two EBPF programs installed) it cannot open it. Worst of all it's not on every boot, so there's a race condition somewhere

will fix it by manually adding all the devices to a container

--device=/dev/nvidia{-uvm-tools,-uvm,ctl} $( for DEVICE in $( find /dev/ -type c -name 'nvidia?' ); do echo --device=$DEVICE; done ) )