Open JohanAR opened 2 years ago
Running this command allows me to run nvidia-smi in docker
setsebool -P container_use_devices 1
But then Stable Diffusion says that it runs out of VRAM when starting up.. Worked perfectly fine despite the audit denied before.. I have no idea what's going on :(
@JohanAR could you downgrade to NVIDIA Container Toolkit 1.10.0 (including the libnvidia-container*
packages) to check whether this is a regression in the new version of the toolkit?
@elezar downgraded, rebooted and tried again but no difference
/s/P/AUTOMATIC111-sd-webui ❯❯❯ rpm -qa | grep nvidia ✘ 4 docker_stuff 63✭ 1✱ 1◼
xorg-x11-drv-nvidia-kmodsrc-515.65.01-1.fc36.x86_64
xorg-x11-drv-nvidia-cuda-libs-515.65.01-1.fc36.x86_64
xorg-x11-drv-nvidia-libs-515.65.01-1.fc36.x86_64
nvidia-settings-515.65.01-1.fc36.x86_64
xorg-x11-drv-nvidia-power-515.65.01-1.fc36.x86_64
xorg-x11-drv-nvidia-515.65.01-1.fc36.x86_64
nvidia-persistenced-515.65.01-1.fc36.x86_64
xorg-x11-drv-nvidia-libs-515.65.01-1.fc36.i686
xorg-x11-drv-nvidia-cuda-libs-515.65.01-1.fc36.i686
kmod-nvidia-5.19.7-200.fc36.x86_64-515.65.01-1.fc36.x86_64
kmod-nvidia-5.19.8-200.fc36.x86_64-515.65.01-1.fc36.x86_64
xorg-x11-drv-nvidia-cuda-515.65.01-1.fc36.x86_64
akmod-nvidia-515.65.01-1.fc36.x86_64
nvidia-vaapi-driver-0.0.6-11.fc36.x86_64
nvidia-gpu-firmware-20220815-139.fc36.noarch
nvidia-modprobe-515.65.01-1.fc36.x86_64
kmod-nvidia-515.65.01-1.fc36.x86_64
kmod-nvidia-5.19.9-200.fc36.x86_64-515.65.01-1.fc36.x86_64
nvidia-xconfig-515.65.01-1.fc36.x86_64
libnvidia-container1-1.10.0-1.x86_64
libnvidia-container-tools-1.10.0-1.x86_64
nvidia-container-toolkit-1.10.0-1.x86_64
/s/P/AUTOMATIC111-sd-webui ❯❯❯ docker run --rm --gpus all --runtime nvidia nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi docker_stuff 63✭ 1✱ 1◼
Failed to initialize NVML: Insufficient Permissions
Uninstalling moby-engine and switching to docker-ce again works fine with 1.10.0-1 versions. The above command to run nvidia-smi worked immediately, but I had to recreate my other docker images for them to be able to access CUDA. Maybe that's normal for SELinux
Don't know if relevant, but there was an update to the container-selinux package. At least it sounds like it could be related to selinux permissions for docker containers, but it's just a guess. Haven't had time and motivation to try going back to moby-engine since it's currently working for me.
Problem:
Background: I replaced moby-engine on Fedora 36 with docker-ce from https://download.docker.com/linux/fedora/docker-ce.repo because I thought that was necessary to use nvidia docker. That worked perfectly fine for a few days until it stopped working after an update, so I thought I'd follow @elezar 's tip to try to run it with moby instead. I removed all the packages that I got from docker-ce and disabled that repo. Installed moby-engine and nvidia-container-toolkit, but when running nvidia-smi in docker no longer works (i.e. it did with docker-ce and nvidia-docker2) because of SELinux stuff. However it seems like I could still access the GPU from a different docker image, which was running Stable Diffusion.
Possibly not related, but when installing the package container-selinux (a dependecy of moby-engine) it freezes close to 10 minutes while running a scriptlet. After that it continues and looks like it succeeded.
System info:
Syslog: