NVIDIA / libnvidia-container

NVIDIA container runtime library
Apache License 2.0
816 stars 200 forks source link

Using nvidia-container-cli with systemd-nspawn containers: nvml error: driver not loaded #206

Closed Jip-Hop closed 1 year ago

Jip-Hop commented 1 year ago

I'm trying to make the nvidia driver available inside a systemd-nspawn container by bind-mounting the files listed bynvidia-container-cli list. It seems to be working (sort of), but we're running into some issues like:

nvidia-container-cli: initialization error: nvml error: driver not loaded

Are there any special steps required to make the driver available (and work properly) inside the container?

There's this answer from 2018 by @3XX0 saying:

list shows partially what needs to be injected inside containers.

I'm very interested to know what else there needs to be done.

According to the archlinux wiki it should be possible to mount the nvidia drivers from the host inside the nspawn container. However we're trying to make it work on a Debian 11 based appliance OS (TrueNAS SCALE), so we can't follow the instructions in the wiki.

The OS inside the container is also Debian 11 and is installed from the LXC image provided by https://images.linuxcontainers.org.

Our attempt can be found in this issue: https://github.com/Jip-Hop/jailmaker/issues/4.

Jip-Hop commented 1 year ago

We got it working by running:

[ ! -f /dev/nvidia-uvm ] && modprobe nvidia-current-uvm && /usr/bin/nvidia-modprobe -c0 -u

And nvidia-smi -f /dev/null.

Before running nvidia-container-cli list. Then there was also the need to create a .conf file inside the container under /etc/ld.so.conf.d/ with /usr/lib/x86_64-linux-gnu/nvidia/current as contents, otherwise ldconfig wouldn't pick up mounted nvidia libraries.