Closed Ixian closed 1 year ago
What does the log say after "Starting jail with the following command:" when you start the jail?
Also what is the output of nvidia-container-cli list
on the host?
Thanks for testing an reporting!
sudo ./jlmkr.py start dockerjail
Config loaded!
Starting jail with the following command:
systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-dockerjail --working-directory=./jails/dockerjail '--description=My nspawn jail dockerjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=dockerjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind=/mnt/ssd-storage/appdata/ --bind=/mnt/Slimz/
Starting jail with name: dockerjail
Running as unit: jlmkr-dockerjail.service
Check logging:
journalctl -u jlmkr-dockerjail
Check status:
systemctl status jlmkr-dockerjail
Stop the jail:
machinectl stop dockerjail
Get a shell:
machinectl shell dockerjail
and
$ nvidia-container-cli list
/dev/nvidiactl
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
/dev/nvidia-modeset
/dev/nvidia0
/usr/lib/nvidia/current/nvidia-smi
/usr/bin/nvidia-persistenced
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01
Thanks! I have just updated the python script. Could you try again please?
Deleted old jail, started over with fresh new jail, getting this error when trying to start:
Do you want to start the jail? [Y/n] Y
Config loaded!
Traceback (most recent call last):
File "/mnt/ssd-storage/jailmaker/./jlmkr.py", line 666, in <module>
main()
File "/mnt/ssd-storage/jailmaker/./jlmkr.py", line 651, in main
create_jail(args.name)
File "/mnt/ssd-storage/jailmaker/./jlmkr.py", line 613, in create_jail
start_jail(jail_name)
File "/mnt/ssd-storage/jailmaker/./jlmkr.py", line 108, in start_jail
if subprocess.run(['modprobe', 'br_netfilter']).returncode == 0:
File "/usr/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.9/subprocess.py", line 1823, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'modprobe'
More error detail - appears to be an extraneous `--bind-ro==' in the generated CLI now?
Config loaded!
Starting jail with the following command:
systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-gtjail --working-directory=./jails/gtjail '--description=My nspawn jail gtjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=gtjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind=/dev/nvidiactl --bind=/dev/nvidia-uvm --bind=/dev/nvidia-uvm-tools --bind=/dev/nvidia-modeset --bind=/dev/nvidia0 --bind-ro==/usr/lib/nvidia/current/nvidia-smi --bind-ro==/usr/bin/nvidia-persistenced --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01 --bind-ro== --bind=/mnt/ssd-storage/ --bind=/mnt/Slimz/
Starting jail with name: gtjail
Job for jlmkr-gtjail.service failed.
See "systemctl status jlmkr-gtjail.service" and "journalctl -xe" for details.
Failed to start the jail...
In case of a config error, you may fix it with:
nano jails/gtjail/config
Ah you're right that doesn't look good. If you replace the double == with single ones and run the command itself directly to start the jail, does the Nvidia driver work inside the jail?
If so we know this approach will work and I should fix the double == in the code.
Thanks for helping. Since I don't have Nvidia GPU I couldn't test this part :)
Should be fixed now.
Still same problem - I notice you changed this:
systemd_nspawn_additional_args.append(
f"--bind-ro={file_path}")
However it still isn't appending {file_path}, it just outputs a blank "--bind-ro=" and that is what stops the jail from starting.
If I remove the blank line I can start the jail however Nvidia drivers still don't appear to work inside it.
Something in the routine you have for mounting the directories using that subroutine to detect /dev or not seems to be broken but I can't see it.
More info:
The problem I outline above about the extraneous `--bind-ro==' appended to the launch string will prevent the machine from starting, however you can edit around that since it does appear to bind all the other directories, it's just adding that blank one at the end. I am not familiar enough with Python and how it handles loops (other than foreach is implicit) but that is likely simple to fix.
The bigger issue is it's still not passing through everything needed from the host as the following error still happens even when I mod the startup to get the jail running:
root@dockjail:~# nvidia-container-cli list
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
Empty bind-ro line should now be fixed. Thanks! What happens when you run nvidia-smi -a
directly inside the jail?
Also please try these steps inside a fresh jail: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/nvidia-docker.html
I think running ldconfig
once inside the jail may cause the mounted drivers to be detected. https://github.com/NVIDIA/nvidia-docker/issues/854
Thanks Jip-Hop - the empty bind-ro line is indeed fixed (and I learned something about python today reading your commit) however the Nvidia problems remain. Even running ldconfig in the jail, or in a nvidia container i.e.:
docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 /bin/bash -c "ldconfig && nvidia-smi"
Still fails with the same error:
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
Also, nvidia-container-runtime-list
produces a blank output:
nvidia-container-runtime list
ID PID STATUS BUNDLE CREATED OWNER
and nvidia-smi
isn't available inside the jail at all.
As a sanity check, it does all work outside the jail, I double-checked to make sure I hadn't opened a shell on the wrong machine :)
Could you try /usr/lib/nvidia/current/nvidia-smi -a
inside the jail? Perhaps after running ldconfig
once inside the jail. The nvidia-smi binary should be available inside the jail as far as I can tell from the bind mount flags you've posted. It's probably not available in the path so you need to use the absolute path.
# /usr/lib/nvidia/current/nvidia-smi -a
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
Edit: Also ran ldconfig
I suppose there may still be a (config) file missing in the list of files to bind mount.
This shows the approach should work: https://wiki.archlinux.org/title/systemd-nspawn#Nvidia_GPUs
Maybe something is missing from our list?
Aha!
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
libnvidia-ml.so isn't being passed to the jail; find / -name libnvidia-ml.so
returns nothing.
On the Scale host itself it returns
]# find / -name libnvidia-ml.so
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so
Which doesn't appear to be bound in your script looking at how the directories are enumerated, unless I am missing a piece?
And if so search with a wildcard at the end? It is being bind mounted but it has a different suffix...
Maybe I need to do something similar to this:
https://github.com/NVIDIA/nvidia-docker/issues/1163#issuecomment-1075053593
Too bad this needs additional investigation...
find / -name libnvidia-ml.so
Finds this yes /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01
O.k. so I now have hard-coded to also mount /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
since it seems this is not listed by nvidia-container-cli
but is required for it to work.
I now no longer get the error related to libnvidia-ml.so.1
inside the jail. Now I get this (which I also get on the host so that's probably related to me not having a nvidia GPU).
/usr/lib/nvidia/current/nvidia-smi -a
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Has this fixed it for you?
Ah, progress :) Yes, now nvidia-smi picks it up in the jail itself, however it fails inside containers running in the jail. Looks like /usr/lib/nvidia/current
needs to be in the system path, imagine that would be better to do with the script?
Actually, the problem is a little weirder.
I run this (standard test, from the Nvidia site, done it dozens of times):
docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
And get this error
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown.
But even adding the correct directory to my path in Bashrc/etc. doesn't fix it. Something else strange is going on here.
Probably needs to be in path system wide not just for current user?
But nice, progress!
Something is off with it and it has to be due to how the drivers are pulled in from the host. We still might be missing something.
When you get to the point that it works inside the jail, but not in a docker container, can you try (after having installed nvidia docker):
docker run --rm --gpus all nvidia/cuda:11.0-base bash -c "ldconfig && nvidia-smi"
Just tried the latest update to test the nvidia part, and am also getting errors starting it. Config file looks fine.
Mar 01 20:51:05 truenas systemd-nspawn[1986823]: Failed to stat /dev/nvidia-modeset: No such file or directory
Mar 01 20:51:05 truenas systemd[1]: jlmkr-dockerjail.service: Main process exited, code=exited, status=1/FAILURE
I can run nvidia-smi
root@truenas[~]# nvidia-smi
Wed Mar 1 20:53:07 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:03:00.0 Off | N/A |
| 29% 38C P5 20W / 180W | 0MiB / 8192MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
and
root@truenas[/mnt/pond/jailmaker]# nvidia-container-cli list
/dev/nvidiactl
/dev/nvidia-modeset
/dev/nvidia0
/usr/lib/nvidia/current/nvidia-smi
/usr/bin/nvidia-persistenced
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01
so it is there, just not being picked up? I can see if anything is passed through to the jail itself as can't get that running.
Inside the jail please follow the official steps to get nvidia working with Docker: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/nvidia-docker.html
That would also setup the daemon.json file with nvidia settings.
Then please run ldconfig
inside the jail once and then try:
docker run --rm --gpus all nvidia/cuda:11.0-base bash -c "ldconfig && nvidia-smi"
Looking forward to hearing how that goes.
When you get to the point that it works inside the jail, but not in a docker container, can you try (after having installed nvidia docker):
docker run --rm --gpus all nvidia/cuda:11.0-base bash -c "ldconfig && nvidia-smi"
Has no effect.
The problem now boils down to paths & links that don't match the host.
For example, I couldn't get nvidia-smi
to work inside a container because it wasn't being correctly referenced in the jail. I tried creating a symbolic link to it inside /usr/bin (like the Scale host) but then this error came back
# docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
Likely because now nvidia-smi can't reference the correct files. All of this is boiling down to the fact we're trying to replicate how the Scale host has the drivers installed inside the jail and we're still missing things.
I'm starting to wonder if it wouldn't just be easier to go back to how we used to do it i.e. install the correct matching drivers inside the jail vs. trying to leverage all of that from the host. It's not like Scale has dozens of nvidia-driver updates a year; historically they only update them as part of major updates that come once or twice a year. And no guarantee they won't change something that breaks this method even if we do get it working.
Here is how Scale handles some links/etc. on my system:
ls -lah /usr/bin/nvidia*
-rwxr-xr-x 1 root root 51K Sep 5 06:52 /usr/bin/nvidia-container-cli
-rwxr-xr-x 1 root root 3.9M Sep 6 02:23 /usr/bin/nvidia-container-runtime
-rwxr-xr-x 1 root root 2.1M Sep 6 02:23 /usr/bin/nvidia-container-runtime-hook
lrwxrwxrwx 1 root root 38 Dec 13 05:45 /usr/bin/nvidia-container-toolkit -> /usr/bin/nvidia-container-runtime-hook
-rwxr-xr-x 1 root root 3.3M Sep 6 02:23 /usr/bin/nvidia-ctk
-rwsr-xr-x 1 root root 174K Jul 21 2022 /usr/bin/nvidia-modprobe
-rwxr-xr-x 1 root root 241K Jul 21 2022 /usr/bin/nvidia-persistenced
lrwxrwxrwx 1 root root 36 Dec 13 05:45 /usr/bin/nvidia-smi -> /etc/alternatives/nvidia--nvidia-smi
ls -lah /etc/alternatives/nvidia*
lrwxrwxrwx 1 root root 23 Dec 13 05:50 /etc/alternatives/nvidia -> /usr/lib/nvidia/current
lrwxrwxrwx 1 root root 59 Dec 13 05:50 /etc/alternatives/nvidia--libGLX_nvidia.so.0-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.0
lrwxrwxrwx 1 root root 51 Dec 13 05:50 /etc/alternatives/nvidia--libcuda.so-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so
lrwxrwxrwx 1 root root 53 Dec 13 05:50 /etc/alternatives/nvidia--libcuda.so.1-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.1
lrwxrwxrwx 1 root root 54 Dec 13 05:50 /etc/alternatives/nvidia--libnvcuvid.so-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so
lrwxrwxrwx 1 root root 56 Dec 13 05:50 /etc/alternatives/nvidia--libnvcuvid.so.1-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.1
lrwxrwxrwx 1 root root 59 Dec 13 05:50 /etc/alternatives/nvidia--libnvidia-cfg.so.1-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.1
lrwxrwxrwx 1 root root 62 Dec 13 05:50 /etc/alternatives/nvidia--libnvidia-encode.so.1-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.1
lrwxrwxrwx 1 root root 58 Dec 13 05:50 /etc/alternatives/nvidia--libnvidia-ml.so.1-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.1
lrwxrwxrwx 1 root root 58 Dec 13 05:50 /etc/alternatives/nvidia--libnvidia-nvvm.so-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-nvvm.so
lrwxrwxrwx 1 root root 60 Dec 13 05:50 /etc/alternatives/nvidia--libnvidia-nvvm.so.4-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-nvvm.so.4
lrwxrwxrwx 1 root root 70 Dec 13 05:50 /etc/alternatives/nvidia--libnvidia-ptxjitcompiler.so.1-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.1
lrwxrwxrwx 1 root root 50 Dec 13 05:50 /etc/alternatives/nvidia--nvidia-blacklists-nouveau.conf -> /etc/nvidia/current/nvidia-blacklists-nouveau.conf
lrwxrwxrwx 1 root root 47 Dec 13 05:50 /etc/alternatives/nvidia--nvidia-drm-outputclass.conf -> /etc/nvidia/current/nvidia-drm-outputclass.conf
lrwxrwxrwx 1 root root 36 Dec 13 05:50 /etc/alternatives/nvidia--nvidia-load.conf -> /etc/nvidia/current/nvidia-load.conf
lrwxrwxrwx 1 root root 40 Dec 13 05:50 /etc/alternatives/nvidia--nvidia-modprobe.conf -> /etc/nvidia/current/nvidia-modprobe.conf
lrwxrwxrwx 1 root root 34 Dec 13 05:50 /etc/alternatives/nvidia--nvidia-smi -> /usr/lib/nvidia/current/nvidia-smi
lrwxrwxrwx 1 root root 39 Dec 13 05:50 /etc/alternatives/nvidia--nvidia-smi.1.gz -> /usr/lib/nvidia/current/nvidia-smi.1.gz
Ah, progress :) Yes, now nvidia-smi picks it up in the jail itself, however it fails inside containers running in the jail. Looks like
/usr/lib/nvidia/current
needs to be in the system path, imagine that would be better to do with the script?
Let's focus on this first. If we can get nvidia drivers working inside the jail with the current approach, it should not be far away to make it work inside a docker container in the jail as well.
I agree the old approach is tempting at this point. But I'd prefer a solution which doesn't involve downloading, unpacking and installing drivers each time a jail needs to be created (all the required files are already present on the host after all).
Please try one more time with the latest script and report the output of nvidia-smi
in the jail. Would be good to verify if using GPU acceleration works inside a jail directly.
Ah, progress :) Yes, now nvidia-smi picks it up in the jail itself, however it fails inside containers running in the jail. Looks like
/usr/lib/nvidia/current
needs to be in the system path, imagine that would be better to do with the script?Let's focus on this first. If we can get nvidia drivers working inside the jail with the current approach, it should not be far away to make it work inside a docker container in the jail as well.
I agree the old approach is tempting at this point. But I'd prefer a solution which doesn't involve downloading, unpacking and installing drivers each time a jail needs to be created (all the required files are already present on the host after all).
Please try one more time with the latest script and report the output of
nvidia-smi
in the jail. Would be good to verify if using GPU acceleration works inside a jail directly.
Different error now (btw I am doing this with clean jails so I can start fresh each testing round):
Preparing to unpack .../libnvidia-container1_1.12.0-1_amd64.deb ...
Unpacking libnvidia-container1:amd64 (1.12.0-1) ...
dpkg: error processing archive /var/cache/apt/archives/libnvidia-container1_1.12.0-1_amd64.deb (--unpack):
unable to make backup link of './usr/lib/x86_64-linux-gnu/libnvidia-container-go.so.1' before installing new version: Invalid cross-device link
Errors were encountered while processing:
/var/cache/apt/archives/libnvidia-container1_1.12.0-1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
When I try to install nvidia-container-toolkit
per the Nvidia instructions.
All the Googling I've done so far on the error returns is some variation of "re-install the drivers" so guessing some link is still missing or broken.
Update: It's possible now that we can skip the step to install the container toolkit since we're pulling that in from the host too. That may be why it fails installing the container toolkit in the jail now (noticed the directory for that file is read-only inside the jail, for understandable reasons given we don't want to mess with the host).
With a fresh jail install using the latest script, I can open a shell in to the jail and successfully run Nvidia-smi. I also updated the docker daemon.json file to use the nvidia runtime. However I still get the same error when I try to have a container inside the jail run Nvidia-smi:
root@jaildock:~# sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
root@jaildock:~#
I've been poking at this tonight as I've been trying for a while to upgrade my TrueNAS, found this script, but didn't realize that nvidia still had some problems.
And... I got something working!
root@docker:~# docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Thu Mar 2 01:42:09 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:0A:00.0 Off | N/A |
| 0% 42C P0 23W / 150W | 0MiB / 6144MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Modifications I made on top of the latest version of the script:
for file_path in nvidia_files
loop
/dev/nvidia-modeset
doesn't exist on my machine, but nvidia-container-cli list
says to use it. Added this as a workaround. Not sure why I don't have the device.--bind=/bin/nvidia-container-cli --bind=/bin/nvidia-container-runtime --bind=/bin/nvidia-container-runtime-hook --bind=/bin/nvidia-container-toolkit
/etc/ld.so.conf.d/nvidia.conf
with: /usr/lib/x86_64-linux-gnu/nvidia/current
and ran ldconfig
Edit: Seems like only ldconfig is necessary from the above. However, even with this I can't get plex to use HW transcoding. nvidia-smi works in my Plex container, but it refuses to use HW transcoding.
I was able to get /dev/nvidia-modeset to pop into existence by starting the nvidia-persistencd service on truenas. Not sure if this is important or not.
Thanks for all the input both of you! Glad to see it starts working in the jail, and somewhat inside docker too already :') My intention wasn't to also mount the nvidia-container-toolkit
from the host. Probably need to narrow down again to a minimal list of driver files required. Then the driver should work in the jail, and installing nvidia-container-toolkit
manually should work. Once nvidia-container-toolkit
is installed manually I expect the GPU to work properly inside docker too. Should also look into nvidia-persistencd
... interesting finding!
I've limited the list of files being mounted again. Hopefully driver still works and allows installing nvidia-container-toolkit
manually.
Well well, look what we have here:
Running in a container in the jail :)
I used the latest script, installed docker and the Nvidia toolkit as normal (no errors), last change I had to make was one of the ones @TrueJournals suggested `created /etc/ld.so.conf.d/nvidia.conf with: /usr/lib/x86_64-linux-gnu/nvidia/current and ran ldconfig' . That cleared up the last path issue (and I learned something more about how ldconfig works) from there I just mounted my external directories (docker apps and media, same ones I use to run docker directly on the host today), pulled the container images down, and brought my compose stack up.
I'm still testing various things but so far, so good. Plex hw transcoding works :) I need to go through all my apps (I have a couple dozen in my Compose stacks) so fingers crossed!
Update: minor error with the script now:
sudo ./start-jail.sh
Config loaded!
nvidia-container-cli: initialization error: nvml error: driver not loaded
Failed to run nvidia-container-cli.
Unable to detect which nvidia driver files to mount.
Falling back to hard-coded list of nvidia files...
Attempting to run nvidia-container-cli
inside the jail produces the following:
nvidia-container-cli list
nvidia-container-cli: initialization error: nvml error: driver not loaded
However nvidia-smi still works as does hw transcoding in Plex. Still, we should chase this down because there are probably going to be other problems due to this error.
I've updated the script again. We are making progress! Thanks @Ixian and @TrueJournals!
New script already calls ldconfig with appropriate config file. So should only need to install docker and the nvidia container toolkit (hopefully no further action).
I just tried your latest version @Jip-Hop but the error I mention with nvidia-container-cli is still present.
Maybe this will help: Here's the output of dpkg -l '*nvidia*'
inside the jail:
dpkg -l '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=============================-============-============-=====================================================
ii libnvidia-container-tools 1.12.0-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.12.0-1 amd64 NVIDIA container runtime library
un nvidia-container-runtime <none> <none> (no description available)
un nvidia-container-runtime-hook <none> <none> (no description available)
ii nvidia-container-toolkit 1.12.0-1 amd64 NVIDIA Container toolkit
ii nvidia-container-toolkit-base 1.12.0-1 amd64 NVIDIA Container Toolkit Base
And here is the output of the same from the Scale host:
dpkg -l '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-======================================-=============-============-=================================================================
un bumblebee-nvidia <none> <none> (no description available)
ii glx-alternative-nvidia 1.2.1~deb11u1 amd64 allows the selection of NVIDIA as GLX provider
un libegl1-glvnd-nvidia <none> <none> (no description available)
un libgl1-glvnd-nvidia-glx <none> <none> (no description available)
un libgl1-nvidia-glx <none> <none> (no description available)
un libgl1-nvidia-legacy-390xx-glx <none> <none> (no description available)
un libgl1-nvidia-tesla-418-glx <none> <none> (no description available)
un libgldispatch0-nvidia <none> <none> (no description available)
un libgles1-glvnd-nvidia <none> <none> (no description available)
un libgles2-glvnd-nvidia <none> <none> (no description available)
un libglvnd0-nvidia <none> <none> (no description available)
ii libglx-nvidia0:amd64 515.65.01-1 amd64 NVIDIA binary GLX library
un libglx0-glvnd-nvidia <none> <none> (no description available)
ii libnvidia-cfg1:amd64 515.65.01-1 amd64 NVIDIA binary OpenGL/GLX configuration library
un libnvidia-cfg1-any <none> <none> (no description available)
ii libnvidia-container-tools 1.11.0-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.11.0-1 amd64 NVIDIA container runtime library
ii libnvidia-eglcore:amd64 515.65.01-1 amd64 NVIDIA binary EGL core libraries
un libnvidia-eglcore-515.65.01 <none> <none> (no description available)
ii libnvidia-encode1:amd64 515.65.01-1 amd64 NVENC Video Encoding runtime library
ii libnvidia-glcore:amd64 515.65.01-1 amd64 NVIDIA binary OpenGL/GLX core libraries
un libnvidia-glcore-515.65.01 <none> <none> (no description available)
ii libnvidia-glvkspirv:amd64 515.65.01-1 amd64 NVIDIA binary Vulkan Spir-V compiler library
un libnvidia-glvkspirv-515.65.01 <none> <none> (no description available)
un libnvidia-legacy-340xx-cfg1 <none> <none> (no description available)
un libnvidia-legacy-390xx-cfg1 <none> <none> (no description available)
un libnvidia-ml.so.1 <none> <none> (no description available)
ii libnvidia-ml1:amd64 515.65.01-1 amd64 NVIDIA Management Library (NVML) runtime library
ii libnvidia-nvvm4:amd64 515.65.01-1 amd64 NVIDIA NVVM
ii libnvidia-ptxjitcompiler1:amd64 515.65.01-1 amd64 NVIDIA PTX JIT Compiler
ii libnvidia-rtcore:amd64 515.65.01-1 amd64 NVIDIA binary Vulkan ray tracing (rtcore) library
un libnvidia-rtcore-515.65.01 <none> <none> (no description available)
un libnvidia-tesla-cfg1 <none> <none> (no description available)
un libopengl0-glvnd-nvidia <none> <none> (no description available)
ii nvidia-alternative 515.65.01-1 amd64 allows the selection of NVIDIA as GLX provider
un nvidia-alternative--kmod-alias <none> <none> (no description available)
un nvidia-alternative-legacy-173xx <none> <none> (no description available)
un nvidia-alternative-legacy-71xx <none> <none> (no description available)
un nvidia-alternative-legacy-96xx <none> <none> (no description available)
ii nvidia-container-runtime 3.11.0-1 all NVIDIA container runtime
un nvidia-container-runtime-hook <none> <none> (no description available)
ii nvidia-container-toolkit 1.11.0-1 amd64 NVIDIA Container toolkit
ii nvidia-container-toolkit-base 1.11.0-1 amd64 NVIDIA Container Toolkit Base
un nvidia-cuda-mps <none> <none> (no description available)
un nvidia-current <none> <none> (no description available)
un nvidia-current-updates <none> <none> (no description available)
un nvidia-driver <none> <none> (no description available)
un nvidia-driver-any <none> <none> (no description available)
un nvidia-driver-binary <none> <none> (no description available)
ii nvidia-installer-cleanup 20151021+13 amd64 cleanup after driver installation with the nvidia-installer
un nvidia-kernel-515.65.01 <none> <none> (no description available)
ii nvidia-kernel-common 20151021+13 amd64 NVIDIA binary kernel module support files
ii nvidia-kernel-dkms 515.65.01-1 amd64 NVIDIA binary kernel module DKMS source
un nvidia-kernel-open-dkms-515.65.01 <none> <none> (no description available)
un nvidia-kernel-source <none> <none> (no description available)
ii nvidia-kernel-support 515.65.01-1 amd64 NVIDIA binary kernel module support files
un nvidia-kernel-support--v1 <none> <none> (no description available)
un nvidia-kernel-support-any <none> <none> (no description available)
un nvidia-legacy-304xx-alternative <none> <none> (no description available)
un nvidia-legacy-304xx-driver <none> <none> (no description available)
un nvidia-legacy-340xx-alternative <none> <none> (no description available)
un nvidia-legacy-390xx-vulkan-icd <none> <none> (no description available)
ii nvidia-legacy-check 515.65.01-1 amd64 check for NVIDIA GPUs requiring a legacy driver
ii nvidia-modprobe 515.65.01-1 amd64 utility to load NVIDIA kernel modules and create device nodes
un nvidia-nonglvnd-vulkan-common <none> <none> (no description available)
un nvidia-nonglvnd-vulkan-icd <none> <none> (no description available)
ii nvidia-persistenced 515.65.01-1 amd64 daemon to maintain persistent software state in the NVIDIA driver
un nvidia-settings <none> <none> (no description available)
ii nvidia-smi 515.65.01-1 amd64 NVIDIA System Management Interface
ii nvidia-support 20151021+13 amd64 NVIDIA binary graphics driver support files
un nvidia-tesla-418-vulkan-icd <none> <none> (no description available)
un nvidia-tesla-440-vulkan-icd <none> <none> (no description available)
un nvidia-tesla-alternative <none> <none> (no description available)
un nvidia-vdpau-driver <none> <none> (no description available)
ii nvidia-vulkan-common 515.65.01-1 amd64 NVIDIA Vulkan driver - common files
ii nvidia-vulkan-icd:amd64 515.65.01-1 amd64 NVIDIA Vulkan installable client driver (ICD)
un nvidia-vulkan-icd-any <none> <none> (no description available)
rc xserver-xorg-video-nvidia 515.65.01-1 amd64 NVIDIA binary Xorg driver
un xserver-xorg-video-nvidia-any <none> <none> (no description available)
un xserver-xorg-video-nvidia-legacy-304xx <none> <none> (no description available)
Maybe try a reboot?
nvidia-container-cli: initialization error: nvml error: driver not loaded
That's from the host (TrueNAS). That's not supposed to fail on a system with nvidia card... And I think it was working for you before? Hope we didn't break your TrueNAS installation.
It works if I run it (the command from the cli) on the host, which is odd.
I noticed something interesting - it fails when the jail starts, and fails running inside the jail from the cli until I start my compose stack. As soon as Plex/etc. come up it works so something obviously is being initialized.
And you didn't notice this behavior with the previous method (downloading and installing the drivers from the .run file)?
And you didn't notice this behavior with the previous method (downloading and installing the drivers from the .run file)?
Correct.
Here's a list of all files that were created (L
) or changed (X
) since installing the nvidia driver with the .run
file (previously working method).
LEFT_DIR=rootfs
RIGHT_DIR=rootfs_before
rsync -rinl --ignore-existing "$LEFT_DIR"/ "$RIGHT_DIR"/|sed -e 's/^[^ ]* /L /'
rsync -rinl --ignore-existing "$RIGHT_DIR"/ "$LEFT_DIR"/|sed -e 's/^[^ ]* /R /'
rsync -rinl --existing "$LEFT_DIR"/ "$RIGHT_DIR"/|sed -e 's/^/X /'
L etc/OpenCL/
L etc/OpenCL/vendors/
L etc/OpenCL/vendors/nvidia.icd
L etc/systemd/system/systemd-hibernate.service.wants/
L etc/systemd/system/systemd-hibernate.service.wants/nvidia-hibernate.service -> /usr/lib/systemd/system/nvidia-hibernate.service
L etc/systemd/system/systemd-hibernate.service.wants/nvidia-resume.service -> /usr/lib/systemd/system/nvidia-resume.service
L etc/systemd/system/systemd-suspend.service.wants/
L etc/systemd/system/systemd-suspend.service.wants/nvidia-resume.service -> /usr/lib/systemd/system/nvidia-resume.service
L etc/systemd/system/systemd-suspend.service.wants/nvidia-suspend.service -> /usr/lib/systemd/system/nvidia-suspend.service
L etc/vulkan/
L etc/vulkan/icd.d/
L etc/vulkan/icd.d/nvidia_icd.json
L etc/vulkan/implicit_layer.d/
L etc/vulkan/implicit_layer.d/nvidia_layers.json
L usr/bin/nvidia-bug-report.sh
L usr/bin/nvidia-cuda-mps-control
L usr/bin/nvidia-cuda-mps-server
L usr/bin/nvidia-debugdump
L usr/bin/nvidia-installer
L usr/bin/nvidia-modprobe
L usr/bin/nvidia-ngx-updater
L usr/bin/nvidia-persistenced
L usr/bin/nvidia-powerd
L usr/bin/nvidia-settings
L usr/bin/nvidia-sleep.sh
L usr/bin/nvidia-smi
L usr/bin/nvidia-uninstall -> nvidia-installer
L usr/bin/nvidia-xconfig
L usr/lib/libGL.so.1 -> /usr/lib/x86_64-linux-gnu/libGL.so.1
L usr/lib/firmware/
L usr/lib/firmware/nvidia/
L usr/lib/firmware/nvidia/515.65.01/
L usr/lib/firmware/nvidia/515.65.01/gsp.bin
L usr/lib/nvidia/
L usr/lib/nvidia/egl_dummy_vendor.json
L usr/lib/nvidia/glvnd_check
L usr/lib/nvidia/libGLX_installcheck.so.0
L usr/lib/systemd/system-sleep/nvidia
L usr/lib/systemd/system/nvidia-hibernate.service
L usr/lib/systemd/system/nvidia-powerd.service
L usr/lib/systemd/system/nvidia-resume.service
L usr/lib/systemd/system/nvidia-suspend.service
L usr/lib/x86_64-linux-gnu/libEGL.so -> libEGL.so.1
L usr/lib/x86_64-linux-gnu/libEGL.so.1 -> libEGL.so.1.1.0
L usr/lib/x86_64-linux-gnu/libEGL.so.1.1.0
L usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0 -> libEGL_nvidia.so.515.65.01
L usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.515.65.01
L usr/lib/x86_64-linux-gnu/libGL.so -> libGL.so.1
L usr/lib/x86_64-linux-gnu/libGL.so.1 -> libGL.so.1.7.0
L usr/lib/x86_64-linux-gnu/libGL.so.1.7.0
L usr/lib/x86_64-linux-gnu/libGLESv1_CM.so -> libGLESv1_CM.so.1
L usr/lib/x86_64-linux-gnu/libGLESv1_CM.so.1 -> libGLESv1_CM.so.1.2.0
L usr/lib/x86_64-linux-gnu/libGLESv1_CM.so.1.2.0
L usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.515.65.01
L usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.515.65.01
L usr/lib/x86_64-linux-gnu/libGLESv2.so -> libGLESv2.so.2
L usr/lib/x86_64-linux-gnu/libGLESv2.so.2 -> libGLESv2.so.2.1.0
L usr/lib/x86_64-linux-gnu/libGLESv2.so.2.1.0
L usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.515.65.01
L usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.515.65.01
L usr/lib/x86_64-linux-gnu/libGLX.so -> libGLX.so.0
L usr/lib/x86_64-linux-gnu/libGLX.so.0
L usr/lib/x86_64-linux-gnu/libGLX_indirect.so.0 -> libGLX_nvidia.so.515.65.01
L usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0 -> libGLX_nvidia.so.515.65.01
L usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.515.65.01
L usr/lib/x86_64-linux-gnu/libGLdispatch.so.0
L usr/lib/x86_64-linux-gnu/libOpenCL.so -> libOpenCL.so.1
L usr/lib/x86_64-linux-gnu/libOpenCL.so.1 -> libOpenCL.so.1.0
L usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0 -> libOpenCL.so.1.0.0
L usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0
L usr/lib/x86_64-linux-gnu/libOpenGL.so -> libOpenGL.so.0
L usr/lib/x86_64-linux-gnu/libOpenGL.so.0
L usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1
L usr/lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.515.65.01
L usr/lib/x86_64-linux-gnu/libcuda.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvcuvid.so -> libnvcuvid.so.1
L usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 -> libnvcuvid.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvcuvid.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-allocator.so -> libnvidia-allocator.so.1
L usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.1 -> libnvidia-allocator.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-cfg.so -> libnvidia-cfg.so.1
L usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.1 -> libnvidia-cfg.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1 -> libnvidia-egl-gbm.so.1.1.0
L usr/lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1.1.0
L usr/lib/x86_64-linux-gnu/libnvidia-egl-wayland.so.1 -> libnvidia-egl-wayland.so.1.1.9
L usr/lib/x86_64-linux-gnu/libnvidia-egl-wayland.so.1.1.9
L usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-encode.so -> libnvidia-encode.so.1
L usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 -> libnvidia-encode.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-encode.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-fbc.so -> libnvidia-fbc.so.1
L usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.1 -> libnvidia-fbc.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-gtk2.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-gtk3.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-ml.so -> libnvidia-ml.so.1
L usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 -> libnvidia-ml.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-ml.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.1 -> libnvidia-ngx.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so -> libnvidia-nvvm.so.4
L usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.4 -> libnvidia-nvvm.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1 -> libnvidia-opencl.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so -> libnvidia-opticalflow.so.1
L usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.1 -> libnvidia-opticalflow.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so -> libnvidia-ptxjitcompiler.so.1
L usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-vulkan-producer.so -> libnvidia-vulkan-producer.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-vulkan-producer.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvidia-wayland-client.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvoptix.so.1 -> libnvoptix.so.515.65.01
L usr/lib/x86_64-linux-gnu/libnvoptix.so.515.65.01
L usr/lib/x86_64-linux-gnu/libvdpau_nvidia.so -> vdpau/libvdpau_nvidia.so.515.65.01
L usr/lib/x86_64-linux-gnu/gbm/
L usr/lib/x86_64-linux-gnu/gbm/nvidia-drm_gbm.so -> /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.1
L usr/lib/x86_64-linux-gnu/nvidia/
L usr/lib/x86_64-linux-gnu/nvidia/wine/
L usr/lib/x86_64-linux-gnu/nvidia/wine/_nvngx.dll
L usr/lib/x86_64-linux-gnu/nvidia/wine/nvngx.dll
L usr/lib/x86_64-linux-gnu/vdpau/
L usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.1 -> libvdpau_nvidia.so.515.65.01
L usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.515.65.01
L usr/lib64/xorg/
L usr/lib64/xorg/modules/
L usr/lib64/xorg/modules/drivers/
L usr/lib64/xorg/modules/drivers/nvidia_drv.so
L usr/lib64/xorg/modules/extensions/
L usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so -> libglxserver_nvidia.so.515.65.01
L usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so.515.65.01
L usr/share/applications/nvidia-settings.desktop
L usr/share/doc/NVIDIA_GLX-1.0/
L usr/share/doc/NVIDIA_GLX-1.0/LICENSE
L usr/share/doc/NVIDIA_GLX-1.0/NVIDIA_Changelog
L usr/share/doc/NVIDIA_GLX-1.0/README.txt
L usr/share/doc/NVIDIA_GLX-1.0/nvidia-dbus.conf
L usr/share/doc/NVIDIA_GLX-1.0/nvidia-settings.png
L usr/share/doc/NVIDIA_GLX-1.0/html/
L usr/share/doc/NVIDIA_GLX-1.0/html/acknowledgements.html
L usr/share/doc/NVIDIA_GLX-1.0/html/addressingcapabilities.html
L usr/share/doc/NVIDIA_GLX-1.0/html/addtlresources.html
L usr/share/doc/NVIDIA_GLX-1.0/html/appendices.html
L usr/share/doc/NVIDIA_GLX-1.0/html/audiosupport.html
L usr/share/doc/NVIDIA_GLX-1.0/html/commonproblems.html
L usr/share/doc/NVIDIA_GLX-1.0/html/configlaptop.html
L usr/share/doc/NVIDIA_GLX-1.0/html/configmultxscreens.html
L usr/share/doc/NVIDIA_GLX-1.0/html/configtwinview.html
L usr/share/doc/NVIDIA_GLX-1.0/html/depth30.html
L usr/share/doc/NVIDIA_GLX-1.0/html/displaydevicenames.html
L usr/share/doc/NVIDIA_GLX-1.0/html/dma_issues.html
L usr/share/doc/NVIDIA_GLX-1.0/html/dpi.html
L usr/share/doc/NVIDIA_GLX-1.0/html/dynamicboost.html
L usr/share/doc/NVIDIA_GLX-1.0/html/dynamicpowermanagement.html
L usr/share/doc/NVIDIA_GLX-1.0/html/editxconfig.html
L usr/share/doc/NVIDIA_GLX-1.0/html/egpu.html
L usr/share/doc/NVIDIA_GLX-1.0/html/faq.html
L usr/share/doc/NVIDIA_GLX-1.0/html/flippingubb.html
L usr/share/doc/NVIDIA_GLX-1.0/html/framelock.html
L usr/share/doc/NVIDIA_GLX-1.0/html/gbm.html
L usr/share/doc/NVIDIA_GLX-1.0/html/glxsupport.html
L usr/share/doc/NVIDIA_GLX-1.0/html/gpunames.html
L usr/share/doc/NVIDIA_GLX-1.0/html/gsp.html
L usr/share/doc/NVIDIA_GLX-1.0/html/i2c.html
L usr/share/doc/NVIDIA_GLX-1.0/html/index.html
L usr/share/doc/NVIDIA_GLX-1.0/html/installationandconfiguration.html
L usr/share/doc/NVIDIA_GLX-1.0/html/installdriver.html
L usr/share/doc/NVIDIA_GLX-1.0/html/installedcomponents.html
L usr/share/doc/NVIDIA_GLX-1.0/html/introduction.html
L usr/share/doc/NVIDIA_GLX-1.0/html/kernel_open.html
L usr/share/doc/NVIDIA_GLX-1.0/html/kms.html
L usr/share/doc/NVIDIA_GLX-1.0/html/knownissues.html
L usr/share/doc/NVIDIA_GLX-1.0/html/minimumrequirements.html
L usr/share/doc/NVIDIA_GLX-1.0/html/newusertips.html
L usr/share/doc/NVIDIA_GLX-1.0/html/ngx.html
L usr/share/doc/NVIDIA_GLX-1.0/html/nvidia-debugdump.html
L usr/share/doc/NVIDIA_GLX-1.0/html/nvidia-ml.html
L usr/share/doc/NVIDIA_GLX-1.0/html/nvidia-peermem.html
L usr/share/doc/NVIDIA_GLX-1.0/html/nvidia-persistenced.html
L usr/share/doc/NVIDIA_GLX-1.0/html/nvidia-smi.html
L usr/share/doc/NVIDIA_GLX-1.0/html/nvidiasettings.html
L usr/share/doc/NVIDIA_GLX-1.0/html/openglenvvariables.html
L usr/share/doc/NVIDIA_GLX-1.0/html/optimus.html
L usr/share/doc/NVIDIA_GLX-1.0/html/powermanagement.html
L usr/share/doc/NVIDIA_GLX-1.0/html/primerenderoffload.html
L usr/share/doc/NVIDIA_GLX-1.0/html/procinterface.html
L usr/share/doc/NVIDIA_GLX-1.0/html/profiles.html
L usr/share/doc/NVIDIA_GLX-1.0/html/programmingmodes.html
L usr/share/doc/NVIDIA_GLX-1.0/html/randr14.html
L usr/share/doc/NVIDIA_GLX-1.0/html/retpoline.html
L usr/share/doc/NVIDIA_GLX-1.0/html/selectdriver.html
L usr/share/doc/NVIDIA_GLX-1.0/html/sli.html
L usr/share/doc/NVIDIA_GLX-1.0/html/supportedchips.html
L usr/share/doc/NVIDIA_GLX-1.0/html/vdpausupport.html
L usr/share/doc/NVIDIA_GLX-1.0/html/wayland-issues.html
L usr/share/doc/NVIDIA_GLX-1.0/html/xcompositeextension.html
L usr/share/doc/NVIDIA_GLX-1.0/html/xconfigoptions.html
L usr/share/doc/NVIDIA_GLX-1.0/html/xineramaglx.html
L usr/share/doc/NVIDIA_GLX-1.0/html/xrandrextension.html
L usr/share/doc/NVIDIA_GLX-1.0/html/xwayland.html
L usr/share/doc/NVIDIA_GLX-1.0/samples/
L usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2
L usr/share/doc/NVIDIA_GLX-1.0/supported-gpus/
L usr/share/doc/NVIDIA_GLX-1.0/supported-gpus/LICENSE
L usr/share/doc/NVIDIA_GLX-1.0/supported-gpus/supported-gpus.json
L usr/share/egl/
L usr/share/egl/egl_external_platform.d/
L usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json
L usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
L usr/share/glvnd/
L usr/share/glvnd/egl_vendor.d/
L usr/share/glvnd/egl_vendor.d/10_nvidia.json
L usr/share/man/man1/nvidia-cuda-mps-control.1.gz
L usr/share/man/man1/nvidia-installer.1.gz
L usr/share/man/man1/nvidia-modprobe.1.gz
L usr/share/man/man1/nvidia-persistenced.1.gz
L usr/share/man/man1/nvidia-settings.1.gz
L usr/share/man/man1/nvidia-smi.1.gz
L usr/share/man/man1/nvidia-xconfig.1.gz
L usr/share/nvidia/
L usr/share/nvidia/nvidia-application-profiles-515.65.01-key-documentation
L usr/share/nvidia/nvidia-application-profiles-515.65.01-rc
L var/lib/nvidia/
L var/lib/nvidia/dirs
L var/lib/nvidia/log
X >f.sT...... etc/ld.so.cache
X >f.sT...... root/.bash_history
X >f.sT...... var/cache/ldconfig/aux-cache
X >f..T...... var/log/lastlog
X >f.sT...... var/log/nvidia-installer.log
X >f.sT...... var/log/wtmp
X >f..T...... var/log/journal/f0db7addd78847dfb4ed5576d9813374/system.journal
I'm not exactly sure what the difference is between now and what I was doing yesterday, but Plex transcoding is now working for me as well!
Used the latest version of the script, created container and installed docker and nvidia container toolkit inside. Started Plex and it's HW transcoding now. Awesome!
Glad to hear it's working for you now!
Did you run into the same issue as @Ixian?
Would be great if you could try to run nvidia-container-cli list
(on the host as well as in the jail) while your plex container is running and see if you run into initialization error: nvml error: driver not loaded
.
Perhaps even try to make a few jails, all with GPU passthrough, to test if the GPU can be properly accessed simultaneously.
I just opened this issue to see if we're still missing something in our setup and ask how to solve the initialization error. Additional data points would be very helpful.
Tried the new script and having a crazy time trying to get Debian11 to work and failed big time. Switched over to Ubuntu jammy and no issues. nvidia-smi worked right off, but obviously no nvidia docker runtime.
This was my process, in case anybody wants to repeat it:
curl https://get.docker.com | sh \ && sudo systemctl --now enable docker
docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt update
apt install -y nvidia-container-toolkit
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Getting this error:
Looks like everything might not be getting passed through.