Jip-Hop / jailmaker

Persistent Linux 'jails' on TrueNAS SCALE to install software (k3s, docker, portainer, podman, etc.) with full access to all files via bind mounts thanks to systemd-nspawn!
GNU Lesser General Public License v3.0
509 stars 43 forks source link

Nvidia passthrough broken #4

Closed Ixian closed 1 year ago

Ixian commented 1 year ago

Getting this error:


-- WARNING, the following logs are for debugging purposes only --

I0227 16:30:43.055366 3314 nvc.c:376] initializing library context (version=1.12.0, build=7678e1af094d865441d0bc1b97c3e72d15fcab50)
I0227 16:30:43.055432 3314 nvc.c:350] using root /
I0227 16:30:43.055437 3314 nvc.c:351] using ldcache /etc/ld.so.cache
I0227 16:30:43.055442 3314 nvc.c:352] using unprivileged user 65534:65534
I0227 16:30:43.055460 3314 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0227 16:30:43.055577 3314 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
I0227 16:30:43.057645 3315 nvc.c:278] loading kernel module nvidia
I0227 16:30:43.057787 3315 nvc.c:282] running mknod for /dev/nvidiactl
I0227 16:30:43.057820 3315 nvc.c:286] running mknod for /dev/nvidia0
I0227 16:30:43.057840 3315 nvc.c:290] running mknod for all nvcaps in /dev/nvidia-caps
I0227 16:30:43.063197 3315 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config
I0227 16:30:43.063256 3315 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor
I0227 16:30:43.064371 3315 nvc.c:296] loading kernel module nvidia_uvm
I0227 16:30:43.064395 3315 nvc.c:300] running mknod for /dev/nvidia-uvm
I0227 16:30:43.064434 3315 nvc.c:305] loading kernel module nvidia_modeset
I0227 16:30:43.064464 3315 nvc.c:309] running mknod for /dev/nvidia-modeset
I0227 16:30:43.064644 3316 rpc.c:71] starting driver rpc service
I0227 16:30:43.064985 3314 rpc.c:135] driver rpc service terminated with signal 15
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0227 16:30:43.065009 3314 nvc.c:434] shutting down library context

Looks like everything might not be getting passed through.

Jip-Hop commented 1 year ago

@Talung I don't understand the issue.

but obviously no nvidia docker runtime.

What does that mean?

Sounds like everything is working in Ubuntu for you (including nvidia drivers and docker). If that's the case and the issue is you can't get the debian image to work then please open a new issue and post the errors you're running into. Perhaps you could also remove the hidden lxc directory (check with ls -la in the script directory) to ensure the debian image will be freshly downloaded.

Talung commented 1 year ago

@Talung I don't understand the issue.

but obviously no nvidia docker runtime.

Sorry didn't make myself clear. ie, when running the docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi command it specifies --runtime=nvidia which doesn't exist until you add the nvidia-container-toolkit and update the daemon.json.

I will start another debian image, but I remember not getting any access to nvidia-smi which was available in the ubuntu version. And it could be and old image which I will attempt again.

Will let you know the results.

UPDATE: It was the cache. As soon as I cleared it and created a new image, nvidia-smi was there. UPDATE2: fully works in debian11 now.

root@debianjail:~# docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Unable to find image 'nvidia/cuda:11.6.2-base-ubuntu20.04' locally
11.6.2-base-ubuntu20.04: Pulling from nvidia/cuda
846c0b181fff: Pull complete
b787be75b30b: Pull complete
40a5337e592b: Pull complete
8055c4cd4ab2: Pull complete
a0c882e23131: Pull complete
Digest: sha256:9928940c6e88ed3cdee08e0ea451c082a0ebf058f258f6fbc7f6c116aeb02143
Status: Downloaded newer image for nvidia/cuda:11.6.2-base-ubuntu20.04
Fri Mar  3 13:41:21 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 29%   38C    P5    20W / 180W |      0MiB /  8192MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Ixian commented 1 year ago

What does nvidia-container-cli list return, on the host and in the jail?

Try it right after you start the jail and access the shell too.

Talung commented 1 year ago

On the TrueNAS box:

root@truenas[~]# nvidia-container-cli list
/dev/nvidiactl
/dev/nvidia-modeset
/dev/nvidia0
/usr/lib/nvidia/current/nvidia-smi
/usr/bin/nvidia-persistenced
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01

And in the debian machine straight after it starts

root@truenas[/mnt/pond/jailmaker]# machinectl shell debianjail
Connected to machine debianjail. Press ^] three times within 1s to exit session.
root@debianjail:~# nvidia-container-cli list
/dev/nvidiactl
/dev/nvidia-modeset
/dev/nvidia0
/usr/bin/nvidia-smi
/usr/bin/nvidia-persistenced
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01

In my box I have an old GTX 1070 card. Not everything might be available through there, but Jellyfin etc. is definitely seeing it now.

Ixian commented 1 year ago

Would you mind trying a little experiment to see if you can reproduce the problem I'm seeing?

Stop your jail machinectl stop _yourjailname_

Start it again and check the full output that returns. When I start a jail I get this:

  sudo ./jlmkr.py start jaildocker2
Config loaded!
nvidia-container-cli: initialization error: nvml error: driver not loaded

Failed to run nvidia-container-cli.
Unable to detect which nvidia driver files to mount.
Falling back to hard-coded list of nvidia files...
ldconfig: File /lib/x86_64-linux-gnu/libnvidia-ml.so.1 is empty, not checked.

Inside the jail shell I can successfully run nvidia-smi however nvidia-container-cli list fails:

 nvidia-container-cli list
nvidia-container-cli: initialization error: nvml error: driver not loaded

Only when I bring up my compose stack, which includes Plex, Emby, and Tdarr (all use GPU) does the error go away.

Though it seems like a non-critical error (because eventually, GPU passthrough does work) something is clearly still not working and that is what we are trying to run down. Would be really helpful to confirm, or not, whether this happens to anyone but me (since @Jip-Hop doesn't have an Nvidia GPU to test with).

Talung commented 1 year ago

Ok, so I did what you asked, and I seem to have no issues whatsoever. Here is the outputs:

root@truenas[/mnt/pond/jailmaker]# ./jlmkr.py start debianjail
Config loaded!

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-debianjail --working-directory=./jails/debianjail '--description=My nspawn jail debianjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=debianjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01 --bind=/dev/nvidia-caps --bind=/dev/nvidiactl --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01 --bind-ro=/usr/bin/nvidia-persistenced --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01 --bind=/dev/nvidia0 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01 --bind-ro=/usr/bin/nvidia-smi --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01 --bind-ro=/usr/lib/nvidia/current/nvidia-smi --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01

Starting jail with name: debianjail

Running as unit: jlmkr-debianjail.service

Check logging:
journalctl -u jlmkr-debianjail

Check status:
systemctl status jlmkr-debianjail

Stop the jail:
machinectl stop debianjail

Get a shell:
machinectl shell debianjail

root@truenas[/mnt/pond/jailmaker]# machinectl shell debianjail
Connected to machine debianjail. Press ^] three times within 1s to exit session.
root@debianjail:~# nvidia-container-cli list
/dev/nvidiactl
/dev/nvidia-modeset
/dev/nvidia0
/usr/bin/nvidia-smi
/usr/bin/nvidia-persistenced
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01
root@debianjail:~# exit
logout
Connection to machine debianjail terminated.

Stopping and restarting the jail

root@truenas[/mnt/pond/jailmaker]# machinectl stop debianjail
root@truenas[/mnt/pond/jailmaker]# ./jlmkr.py start debianjail
Config loaded!

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-debianjail --working-directory=./jails/debianjail '--description=My nspawn jail debianjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=debianjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind-ro=/usr/bin/nvidia-smi --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01 --bind-ro=/usr/bin/nvidia-persistenced --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01 --bind=/dev/nvidia-caps --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01 --bind-ro=/usr/lib/nvidia/current/nvidia-smi --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01 --bind=/dev/nvidiactl --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01 --bind=/dev/nvidia0 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01

Starting jail with name: debianjail

Running as unit: jlmkr-debianjail.service

Check logging:
journalctl -u jlmkr-debianjail

Check status:
systemctl status jlmkr-debianjail

Stop the jail:
machinectl stop debianjail

Get a shell:
machinectl shell debianjail

root@truenas[/mnt/pond/jailmaker]# machinectl shell debianjail
Connected to machine debianjail. Press ^] three times within 1s to exit session.
root@debianjail:~# nvidia-container-cli list
/dev/nvidiactl
/dev/nvidia-modeset
/dev/nvidia0
/usr/bin/nvidia-smi
/usr/bin/nvidia-persistenced
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01
root@debianjail:~# exit

This is on the Debian11 I just created using the exact same sequence as I did for the Ubuntu machine. Maybe I stumbled on a good installation sequence? I am only really familiar with LXC and containers through the use of proxmox, so this is all fairly new to me. Docker as well for only a few months.

Is this the info you were seeking? Let me know if you want any other tests done.

Jip-Hop commented 1 year ago

Thanks @Talung. Looks good!

@Ixian please double check you have te latest script and try a fresh debian jail.

ldconfig: File /lib/x86_64-linux-gnu/libnvidia-ml.so.1 is empty, not checked.

This looks like a file leftover by running a previous version of the script.

Jip-Hop commented 1 year ago

And reboot as well please just to rule that out.

Talung commented 1 year ago

Give me a few.. going to Disable the start of the ubuntu jail that runs my main dockers. Then I will reboot, get the latest image (also remove the .lxc cache) and run through my installation scripts.

Will post results after installation, then after a stop and starting of virtual machine. Do you want a another reboot between starts?

Jip-Hop commented 1 year ago

@Talung your stuff looks good. Only @Ixian should try those steps :)

Talung commented 1 year ago

@Talung your stuff looks good. Only @Ixian should try those steps :)

oops. Just started doing the stuff again.. won't hurt :D and will confirm it for fresh approach. :)

Ixian commented 1 year ago

Starting to look like I broke something with my Scale installation. nvidia-container-cli list doesn't work inside or out of the jail.

Talung commented 1 year ago

mmh.. very interesting... My experiment showed problems now:

root@debianjail:~# docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Unable to find image 'nvidia/cuda:11.6.2-base-ubuntu20.04' locally
11.6.2-base-ubuntu20.04: Pulling from nvidia/cuda
846c0b181fff: Pull complete
b787be75b30b: Pull complete
40a5337e592b: Pull complete
8055c4cd4ab2: Pull complete
a0c882e23131: Pull complete
Digest: sha256:9928940c6e88ed3cdee08e0ea451c082a0ebf058f258f6fbc7f6c116aeb02143
Status: Downloaded newer image for nvidia/cuda:11.6.2-base-ubuntu20.04
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr:                    Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

and

root@debianjail:~# nvidia-container-cli list
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory

This is after the reboot, and everything clean. The only real difference was that I got the Ubuntu jammy working before I tried the debian again. And now the ubuntu jail is exhibiting the same issues.

Very weird.

EDIT: nvidia-smi is now available. It seems you need to wait some time after TrueNAS boots for that stuff to become active.

Ixian commented 1 year ago

Yeah, something still isn't right with how the drivers are being pulled from the host to the jail, but it's tricky trying to run things down. I'm seeing inconsistent results too.

Talung commented 1 year ago

Mine is working again. Going through the stuff this is what I noticed after the reboot:

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStat                   us=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-debianjail                    --working-directory=./jails/debianjail '--description=My nspawn jail debianjail [created with jailmaker]' --setenv=SYSTEMD_SE                   CCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=debianjail --directory=rootfs --c                   apability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri

Starting jail with name: debianjail

This one failed. Then I was looking around the machine for a reason turning on my original ubuntuDocker which also failed. Deleted the new debian one and created the jail again, but this time the command came back with:

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-debianjail --working-directory=./jails/debianjail '--description=My nspawn jail debianjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=debianjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind=/dev/nvidia-caps --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01 --bind-ro=/usr/lib/nvidia/current/nvidia-smi --bind=/dev/nvidiactl --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01 --bind=/dev/nvidia0 --bind-ro=/usr/bin/nvidia-persistenced --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01 --bind-ro=/usr/bin/nvidia-smi

Starting jail with name: debianjail

What it looks like to me is some sort of timing issue. Its almost as if TrueNAS needs to "settle" itself before cards become available.

EDIT: stopping and restarting the ubuntu jail is now working as expected.

Ixian commented 1 year ago

What I'm finding now is the card needs to "initialized" somehow either host or inside the jail.

For example, nvidia-container-cli list will fail (host or inside jail) but nvidia-smi works and once that is run then nvidia-container-cli list works. This is true whether you do this on the host or in the jail.

The problem is this isn't how it used to work. I'm wondering if we're accidentally modifying system files from the jail.

Talung commented 1 year ago

When I did my reboot and rebuild, I did run a nvidia-smi on the root system before starting the builds. I did not run a nvidia-container-cli before. I could try this tomorrow, as I am about to go to sleep.

Jip-Hop commented 1 year ago

Maybe we need to enable the nvidia-persistencd service as suggested by @TrueJournals before running nvidia-container-cli?

Jip-Hop commented 1 year ago

The nvidia-persistenced utility is used to enable persistent software state in the NVIDIA driver. When persistence mode is enabled, the daemon prevents the driver from releasing device state when the device is not in use. This can improve the startup time of new clients in this scenario. Source.

Jip-Hop commented 1 year ago

Latest script retires nvidia-container-cli 3 times in case it fails. Maybe that helps?

Also, the gpu_passthrough config value is deprecated in favor of gpu_passthrough_nvidia and gpu_passthrough_intel. During jail creation you'll be asked about both in case the GPUs are detected. The new script won't write the gpu_passthrough config value for new jails. If it reads gpu_passthrough from the config it will try to passthrough both intel and nvidia like it currently does.

Ixian commented 1 year ago

I'd take the retry out - it won't do anything.

What I'm doing at the moment is just using a simple shell script to start the jail:

!/usr/bin/env bash
sleep 5
nvidia-smi -f /dev/null
sleep 5
/mnt/ssd-storage/jailmaker/jlmkr.py start debianjail

Crude, but works.

Jip-Hop commented 1 year ago

Check this out: setup_nvidia_gpu.

Seems like TrueNAS doesn't full load/init the GPU by default?

    # We install the nvidia-kernel-dkms package which causes a modprobe file to be written
    # (i.e /etc/modprobe.d/nvidia.conf). This file tries to modprobe all the associated
    # nvidia drivers at boot whether or not your system has an nvidia card installed.
    # For all truenas certified and truenas enterprise hardware, we do not include nvidia GPUS.
    # So to prevent a bunch of systemd "Failed" messages to be barfed to the console during boot,
    # we remove this file because the linux kernel dynamically loads the modules based on whether
    # or not you have the actual hardware installed in the system.
    with contextlib.suppress(FileNotFoundError):
        os.unlink(os.path.join(CHROOT_BASEDIR, 'etc/modprobe.d/nvidia.conf'))

Excerpt from the truenas scale-build repo.

Perhaps instead of calling nvidia-smi we should run:

modprobe nvidia-current-uvm
nvidia-modprobe -c0 -u
Jip-Hop commented 1 year ago

Sounds like this is what we're running into: https://www.reddit.com/r/qnap/comments/s7bbv6/fix_for_missing_nvidiauvm_device_devnvidiauvm/

Jip-Hop commented 1 year ago

Once we get the startup streamlined, we should test for endurance:

Sometimes, after the host has been up for a long time, the /dev/nvidia-uvm or other device nodes may disappear. In this case, simply run the nvidia-uvm-init script, perhaps schedule it to run as a cron job. Source.

Jip-Hop commented 1 year ago

I'd take the retry out - it won't do anything.

Retry is out. modprobe nvidia-current-uvm and nvidia-modprobe -c0 -u is in. Please try latest script 🙂

Ixian commented 1 year ago

I'm sure it won't hurt, but I already knew about that problem and had the modprobe command running as a pre-init. I still had the problem but perhaps it would be better to have the jail script run it instead, I'll give it a try.

Talung commented 1 year ago

I just tried a reboot with the dockerjail starting as a post init script. this is with the updated script downloaded. Unfortunately, anything with the runtime: nvidia didn't start meaning passthrough did not happen. Manually stopping and starting it does work now. I did run a nvidia-smi before I did that to make sure truenas picked it up.

Just fyi.

Jip-Hop commented 1 year ago

Any chance you could post the logs of the jailmaker script when it is starting dockerjail after a reboot?

You may need to redirect the output somewhere with > or mail them by piping the output like so: ./jlmkr.py start dockerjail | mail -s "Jailmaker" "youremail@example.com"

Or you could temporarily disable the startup script and run jlmkr manually after the reboot.

I'm tempted to just call nvidia-smi once before nvidia-container-toolkit list just to be done with it.

Talung commented 1 year ago

Sure, no problem. Here is the loadlog

root@truenas[/mnt/pond/jailmaker]# cat loadlog.log
Config loaded!

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-dockerjail --working-directory=./jails/dockerjail '--description=My nspawn jail dockerjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=dockerjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind=/mnt/pond/dockerset --bind=/mnt/pond/appdata/ --bind=/mnt/lake/media/ --bind=/mnt/lake/cloud/

Starting jail with name: dockerjail

Check logging:
journalctl -u jlmkr-dockerjail

Check status:
systemctl status jlmkr-dockerjail

Stop the jail:
machinectl stop dockerjail

Get a shell:
machinectl shell dockerjail

There is no nvidia stuff in there. And here is the log after stopping and starting. In between ran nvidia-smi and nvidia-container-cli list

root@truenas[/mnt/pond/jailmaker]# machinectl stop dockerjail
root@truenas[/mnt/pond/jailmaker]# ./jlmkr.py start dockerjail
Config loaded!

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-dockerjail --working-directory=./jails/dockerjail '--description=My nspawn jail dockerjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=dockerjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01 --bind=/dev/nvidia-uvm-tools --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01 --bind=/dev/nvidiactl --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01 --bind=/dev/nvidia-uvm --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01 --bind-ro=/usr/bin/nvidia-persistenced --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01 --bind=/dev/nvidia-caps --bind-ro=/usr/lib/nvidia/current/nvidia-smi --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01 --bind=/dev/nvidia0 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01 --bind-ro=/usr/bin/nvidia-smi --bind=/mnt/pond/dockerset --bind=/mnt/pond/appdata/ --bind=/mnt/lake/media/ --bind=/mnt/lake/cloud/

Starting jail with name: dockerjail

Running as unit: jlmkr-dockerjail.service

Check logging:
journalctl -u jlmkr-dockerjail

Check status:
systemctl status jlmkr-dockerjail

Stop the jail:
machinectl stop dockerjail

Get a shell:
machinectl shell dockerjail

My post init command is as follows:

/mnt/pond/jailmaker/jlmkr.py start dockerjail > /mnt/pond/jailmaker/loadlog.log

Unfortunately I won't be able to do a lot more testing for the next week as packing up the PC's soon and moving. Hopefully next week Thursday will have most of the stuff up and running so I can do more testing.

Jip-Hop commented 1 year ago

Thanks @Talung. Was that with the latest script? I was expecting to see "No nvidia GPU seems to be present... Skip passthrough of nvidia GPU." in the first case.

But I think it's clear that /dev/nvidia* don't exist yet that soon after boot so I can't rely on that to detect if an nvidia GPU is installed.

Talung commented 1 year ago

Yes. This morning read all the other posts, then ran the update and did a reboot. So unless the script has changed in the last 7 hours, that should be the latest script. Maybe add a little version number to the output so we can confirm that sort of thing. Also, a "run log" where you store the config would also be good for debugging.

Just suggestions. :)

Jip-Hop commented 1 year ago

Thanks @Talung. Versioning has started. We're at v0.0.1.

If anyone could test the following sequence:

Was the jail started with nvidia gpu passthrough working (without manually running nvidia-smi or modprobe)?

Talung commented 1 year ago

Did you change anything else on the script besides the versioning? Was going through those tests you suggested, got the latest script (with version numbers), disabled the post init run (but actually I didn't because I didn't hit the save button) and rebooted.

Did the whole setup:

root@truenas[~]# uptime
 18:29:49 up 1 min,  1 user,  load average: 7.09, 1.97, 0.68
root@truenas[~]# cd /mnt/pond/jailmaker
root@truenas[/mnt/pond/jailmaker]# ./jlmkr.py create testjail
USE THIS SCRIPT AT YOUR OWN RISK!
IT COMES WITHOUT WARRANTY AND IS NOT SUPPORTED BY IXSYSTEMS.

Install the recommended distro (Debian 11)? [Y/n]

Enter jail name: testjail

Docker won't be installed by jlmkr.py.
But it can setup the jail with the capabilities required to run docker.
You can turn DOCKER_COMPATIBLE mode on/off post-install.

Make jail docker compatible right now? [y/N] y

Detected the presence of an intel GPU.
Passthrough the intel GPU? [y/N] y
Detected the presence of an nvidia GPU.
Passthrough the nvidia GPU? [y/N] y

WARNING: CHECK SYNTAX

You may pass additional flags to systemd-nspawn.
With incorrect flags the jail may not start.
It is possible to correct/add/remove flags post-install.

Show the man page for systemd-nspawn? [y/N]

You may read the systemd-nspawn manual online:
https://manpages.debian.org/bullseye/systemd-container/systemd-nspawn.1.en.html

For example to mount directories inside the jail you may add:
--bind='/mnt/data/a writable directory/' --bind-ro='/mnt/data/a readonly directory/'

Additional flags:

Using image from local cache
Unpacking the rootfs

---
You just created a Debian bullseye amd64 (20230303_05:25) container.

To enable SSH, run: apt install openssh-server
No default root or user password are set by LXC.

Do you want to start the jail? [Y/n] y
Config loaded!

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-testjail --working-directory=./jails/testjail '--description=My nspawn jail testjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=testjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind-ro=/usr/lib/nvidia/current/nvidia-smi --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01 --bind-ro=/usr/bin/nvidia-persistenced --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01 --bind-ro=/usr/bin/nvidia-smi --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01 --bind=/dev/nvidia-caps --bind=/dev/nvidia0 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01 --bind=/dev/nvidiactl

Starting jail with name: testjail

Running as unit: jlmkr-testjail.service

Check logging:
journalctl -u jlmkr-testjail

Check status:
systemctl status jlmkr-testjail

Stop the jail:
machinectl stop testjail

Get a shell:
machinectl shell testjail

And then noticed I had an email from watchtower, which I then realised I didn't save the "disabled" change. However, this time the GPU iniitialised on boot. Here is the log:

root@truenas[/mnt/pond/jailmaker]# cat loadlog.log
Config loaded!

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-dockerjail --working-directory=./jails/dockerjail '--description=My nspawn jail dockerjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=dockerjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01 --bind-ro=/usr/lib/nvidia/current/nvidia-smi --bind-ro=/usr/bin/nvidia-smi --bind=/dev/nvidia0 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01 --bind=/dev/nvidiactl --bind-ro=/usr/bin/nvidia-persistenced --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01 --bind=/dev/nvidia-caps --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01 --bind=/mnt/pond/dockerset --bind=/mnt/pond/appdata/ --bind=/mnt/lake/media/ --bind=/mnt/lake/cloud/

Starting jail with name: dockerjail

Looking at the commit history, I see some other changes were made and whatever it was, it seems to have worked.

EDIT: for funsies I did another reboot and guess what... GPU was in jail again!

Jip-Hop commented 1 year ago

Sounds good! Thanks @Talung

Yes I did more then increment the version number hehe ^^

Detected the presence of an nvidia GPU.
Passthrough the nvidia GPU? [y/N] y

This looks good, as it detected nvidia GPU straight after reboot thanks to nvidia-smi. No longer depending on /dev/nvidia* devices to exist.

And then you did another reboot and it ran nvidia-container-toolkit list successfully because the script now runs nvidia-smi beforehand (good idea @Ixian).

So seems to be working now!?

Talung commented 1 year ago

Well if working means that I did 2 reboots and GPU came up both times without issue in a jail with GPU passthrough, then I would say: "Yes, it is working!"

Well done!

CompyENG commented 1 year ago

Grabbed the lastest script and just tried a reboot myself, and I'm definitely running into the linked issue

Everything 'seemed' to be working (nvidia-smi ran successfully in host, jail, and container), but Plex refused to do HW transcoding. I also tried a tensorflow docker container and my GPU wasn't listed.

After poking around a while, I discovered that I didn't have /dev/nvidia-uvm . The module was loaded, and I even tried unloading and reloading the module. I also tried starting nvidia-persistenced, but nothing seemed to work.

I stopped the jail, ran the mknod for /dev/nvidia-uvm and /dev/nvidia-uvm-tools

  D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`

  mknod -m 666 /dev/nvidia-uvm c $D 0
  mknod -m 666 /dev/nvidia-uvm-tools c $D 0

Then re-started the jail, and transcoding in Plex worked! Tried the tensorflow container again and it listed my GPU.

So it seems like 'something' is still missing to get the nvidia-uvm device created.

CompyENG commented 1 year ago

Probably worth noting that I'm on TrueNAS scale 22.12.1

It seems that nvidia-modprobe doesn't work because the modules are named nvidia-current-*.ko instead of just nvidia-*.ko

root@freenas:~# find /lib/modules -name nvidia\*
/lib/modules/5.15.79+truenas/kernel/drivers/net/ethernet/nvidia
/lib/modules/5.15.79+truenas/updates/dkms/nvidia-current-drm.ko
/lib/modules/5.15.79+truenas/updates/dkms/nvidia-current-peermem.ko
/lib/modules/5.15.79+truenas/updates/dkms/nvidia-current-modeset.ko
/lib/modules/5.15.79+truenas/updates/dkms/nvidia-current-uvm.ko
/lib/modules/5.15.79+truenas/updates/dkms/nvidia-current.ko

But nvidia-modprobe is hard-coded to use nvidia-uvm as the module name.

I did get nvidia-modprobe to do the right thing by creating a symbolic link and running depmod

ln -s /lib/modules/5.15.79+truenas/updates/dkms/nvidia-current-uvm.ko /lib/modules/5.15.79+truenas/updates/dkms/nvidia-uvm.ko
depmod
nvidia-modprobe -c0 -u

After that, /dev/nvidia-uvm exists.

Since the mknod commands are documented by nvidia, that solution feels a bit less 'hacky'

Ixian commented 1 year ago

@TrueJournals I've had to have the following running as a pre-init command since at least 2 Scale releases past:

[ ! -f /dev/nvidia-uvm ] && modprobe nvidia-current-uvm && /usr/bin/nvidia-modprobe -c0 -u

In order to keep the situation you are seeing from happening. That was true even when I was running docker off the Scale host itself. I've had it in there ever since and I haven't had the problem you are seeing.

I think @Jip-Hop added it to the script as well but I believe it is something that needs to happen pre-init if you want your Nvidia GPU to reliably show up in Scale. Something to do with how IX Systems won't load it unless called upon to eliminate boot logging errors. The K3S backed app system handles it behind the scenes when it is used, we need to do it manually.

CompyENG commented 1 year ago

Thanks for that tip @Ixian ! Looks like that will do it. Quick log from boot (without any special init):

root@freenas:~# ls /dev/nvid*
ls: cannot access '/dev/nvid*': No such file or directory
root@freenas:~# lsmod | grep nvid
nvidia_drm             73728  0
nvidia_modeset       1150976  1 nvidia_drm
nvidia              40853504  1 nvidia_modeset
drm_kms_helper        315392  1 nvidia_drm
drm                   643072  4 drm_kms_helper,nvidia,nvidia_drm
root@freenas:~# modprobe nvidia-current-uvm
root@freenas:~# ls /dev/nvid*
ls: cannot access '/dev/nvid*': No such file or directory
root@freenas:~# lsmod | grep nvid
nvidia_uvm           1302528  0
nvidia_drm             73728  0
nvidia_modeset       1150976  1 nvidia_drm
nvidia              40853504  2 nvidia_uvm,nvidia_modeset
drm_kms_helper        315392  1 nvidia_drm
drm                   643072  4 drm_kms_helper,nvidia,nvidia_drm
root@freenas:~# nvidia-modprobe -c0 -u
root@freenas:~# ls /dev/nvid*
/dev/nvidia-uvm  /dev/nvidia-uvm-tools
root@freenas:~# lsmod | grep nvid
nvidia_uvm           1302528  0
nvidia_drm             73728  0
nvidia_modeset       1150976  1 nvidia_drm
nvidia              40853504  2 nvidia_uvm,nvidia_modeset
drm_kms_helper        315392  1 nvidia_drm
drm                   643072  4 drm_kms_helper,nvidia,nvidia_drm

Running nvidia-smi will then create the /dev/nvidia0 and /dev/nvidiactl devices.

Looks like the most recent commit removed the modprobe in favor of just running nvidia-smi

So, I guess this is the answer for the TODO @Jip-Hop -- nvidia-smi is necessary, but not sufficient :) The modprobe and nvidia-modprobe must be run as well.

Ixian commented 1 year ago

@Jip-Hop I just went through the latest script (0.0.1 and thanks for adding versioning) and I think it's really coming together, like the changes, learned a few new things about Python too so thanks :)

I'm using 0.0.1 now and so far so good, gone through multiple reboot tests and everything launches clean & my GPU works, I'm able to use hw transcoding in Plex & Tdarr (tested both after each). Haven't seen any other problems (performance, etc.) yet but will keep an eye on things. I think I'm ready to switch over to this full time vs. running docker directly on the host. Famous last words but: Fingers crossed :)

Ixian commented 1 year ago

Thanks for that tip @Ixian ! Looks like that will do it. Quick log from boot (without any special init):

Running nvidia-smi will then create the /dev/nvidia0 and /dev/nvidiactl devices.

Looks like the most recent commit removed the modprobe in favor of just running nvidia-smi

So, I guess this is the answer for the TODO @Jip-Hop -- nvidia-smi is necessary, but not sufficient :) The modprobe and nvidia-modprobe must be run as well.

Yep, I just saw he removed it as well BUT I think that's fine, I am pretty certain the correct order to load the modules during boot is pre-init so probably just an instruction to add it as a pre-init command is enough. That's what we did when we first started running DIY docker with Scale.

Here's a screenshot @Jip-Hop if you want to add it to the readme:

nvidia-modules
CompyENG commented 1 year ago

With the pre-init script, things are working -- but it looks like nvidia-container-cli doesn't work. It seems that 'something' still isn't initialized without running nvidia-smi, but latest script checks for /dev/nvidia-uvm to decide to run nvidia-smi. Ended up with this error on jlmkr.py start

nvidia-container-cli: initialization error: nvml error: driver not loaded

Unable to detect which nvidia driver files to mount.
Falling back to hard-coded list of nvidia files...

I decided to just add nvidia-smi to my pre-init command. I also thought it might be a good idea to run modprobe-nvidia regardless of if the modprobe nvidia-current-uvm works (if the module name changes to just nvidia-uvm in the future...)

I also changed to detect the path to modprobe instead of relying on PATH or on a hard-coded path. Probably not necessary, but I found it interesting.

So, my final pre-init command is:

[ ! -f /dev/nvidia-uvm ] && ( $(cat /proc/sys/kernel/modprobe) nvidia-current-uvm; /usr/bin/nvidia-modprobe -c0 -u; nvidia-smi -f /dev/null )
Jip-Hop commented 1 year ago

I had no idea it would take 5 days and about 100 comments to get nvidia passthrough working >.<

Updated the script to v0.0.2. I removed some code I think we no longer need, as long as the pre-init command command is scheduled (this one or the one above this comment).

Would be great if you could run through the testing sequence again (and run whatever additional tests you think are relevant).

If this works I'll add documentation regarding the pre-init command.

P.S. @TrueJournals if you have an idea how to run ldconfig inside the jail without having to resort to hardcoding /usr/lib/x86_64-linux-gnu/nvidia/current and writing a new .conf file, that would be great. I tried some different things, without success and I'm not to thrilled about the current solution.

CompyENG commented 1 year ago

P.S. @TrueJournals if you have an idea how to run ldconfig inside the jail without having to resort to hardcoding /usr/lib/x86_64-linux-gnu/nvidia/current and writing a new .conf file, that would be great. I tried some different things, without success and I'm not to thrilled about the current solution.

Alright, you got me curious ;) I dug into this, because I was curious how nvidia handled this. So I dug through libnvidia-container and container-toolkit. Here's what I can tell...

TLDR: They find all unique folders from nvidia-container-cli list, and create a file in /etc/ld.so.conf.d based on that.

nvidia has a hard-coded list of libraries in libnvidia-container. Actually, this is multiple lists depending on what capabilities you want in the container. In order to find the full path to these libraries, they parse the ldcache file directly to turn the short library names into full paths. You can see that also in find_library_paths

Over in container-toolkit (which contains the 'hooks' for when containers are created or whatever), there's code to get a list of libraries from "mounts" (a little unclear what these mounts are -- assuming mounts on the container?) by matching paths against lib?*.so* (syntax for Match). In this same file, they have a function that generates a list of unique folders for this list of files.

Finally, they can create a file in /etc/ld.so.conf.d with a random name that lists all these folders and run ldconfig. It looks like this happens outside the container itself by using the -r option on ldconfig.

Now, what I'm still a little confused by is that I don't actually see this happening in my docker container. What's also weird is that libraries show up like this:

root@f7ca5192b700:/# ls -al /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1
lrwxrwxrwx 1 root root 29 Mar  2 17:14 /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 -> libnvidia-encode.so.515.65.01

Even though that library is located at /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01 on the "host" (the 'jail' in this case). So it seems like there's another level of remapping and some additional optimization but I'm not quite sure how that works.

Anyway, the logic of 'discover the paths based on the list of libraries' seems reasonable enough. You could even run nvidia-container-cli list --libraries to get the list of libraries (without binaries and other files) if you didn't want to filter down based on filename patterns.

Jip-Hop commented 1 year ago

Thanks for digging into this :)

TLDR: They find all unique folders from nvidia-container-cli list, and create a file in /etc/ld.so.conf.d based on that.

Well, then I will no longer feel bad for writing that file :')

Using the output of nvidia-container-cli list --libraries to determine the content of our .conf file sounds like a nice improvement.

By the way, how is v0.0.2 for you? :)

CompyENG commented 1 year ago

Just tried v0.0.2 and it seems to work fine (I can only reboot my server so many times in a day :laughing: )

Also sent you a PR to implement the above suggesting of discovering library paths based on the output of nvidia-container-cli list --libraries . Tested with a new and existing jail locally and it seems to behave fine.

Jip-Hop commented 1 year ago

We're now on v0.0.3 thanks to @TrueJournals :)

I've added the Pre Init command instructions to the readme.

Looking forward to hearing from @Ixian and @Talung one last time if all is working properly. Hopefully we can soon close this issue.

Ixian commented 1 year ago

Updated to 0.0.3, rebooted, all working, Plex hw transcoding working.

Question: Do we need to re-generate a new jail with each version i.e. has the cli launch command in the config file changed? I'm still testing with the jail I created with 0.0.1.

Jip-Hop commented 1 year ago

Nice!

The debugging we did with the script may have left some residual files (symlinks, empty folders), so recreating may not be a bad idea.

But in general my intention is that there should not be a need to regenerate a jail when using a newer version of the script.

Ixian commented 1 year ago

I'm happy to close this now if you want, I think we've gotten it.