Open nandlab opened 1 year ago
@nandlab I don't recall which version of the toolkit supports the 390.157 driver. With that said, you may be able to generate a CDI specification on your system using nvidia-ctk cdi generate
and then use the generated spec.
Podman (>4.1.0) natively supports CDI and it is possible to configure the nvidia-container-runtime
to perform the injection of the devices when using the Docker CLI.
Does the nvidia-ckt cdi generate
command generate a spec on your system?
@elezar Thank you for the fast reply!
sudo nvidia-ctk cdi generate
outputs:
INFO[0000] Auto-detected mode as "nvml"
INFO[0000] Selecting /dev/nvidia0 as /dev/nvidia0
INFO[0000] Selecting /dev/dri/card0 as /dev/dri/card0
INFO[0000] Selecting /dev/dri/renderD128 as /dev/dri/renderD128
nvidia-ctk: symbol lookup error: nvidia-ctk: undefined symbol: nvmlDeviceGetMaxMigDeviceCount
It exits with an error code of 127.
Btw, here is the output of nvidia-smi
:
Mon May 22 12:57:20 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.157 Driver Version: 390.157 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 NVS 5400M Off | 00000000:01:00.0 N/A | N/A |
| N/A 42C P8 N/A / N/A | 52MiB / 959MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
The output of nvidia-smi
is the same in the container, but I get no hardware accelerated graphics anyway. Here is how I start an Ubuntu container for testing: sudo docker run -it -h "$HOSTNAME" -e "DISPLAY=$DISPLAY" -v '/tmp/.X11-unix:/tmp/.X11-unix' -v "$HOME/.Xauthority:/root/.Xauthority" --runtime nvidia --gpus 'all,capabilities=utility' --rm ubuntu
Is there anything else I can try?
OK, that should not fail in this mode since we don't expect to generate specs for MIG devices in any case. I will create a ticket to track.
For now, you could use:
nvidia-ctk cdi generate --mode=management
To generate a basic spec with a single devices (nvidia.com/gpu=all). Does that produce output?
nvidia-ctk cdi generate --mode=management
also fails:
INFO[0000] Selecting /dev/nvidia-modeset as /dev/nvidia-modeset
INFO[0000] Selecting /dev/nvidia-uvm as /dev/nvidia-uvm
INFO[0000] Selecting /dev/nvidia-uvm-tools as /dev/nvidia-uvm-tools
INFO[0000] Selecting /dev/nvidia0 as /dev/nvidia0
INFO[0000] Selecting /dev/nvidiactl as /dev/nvidiactl
WARN[0000] Could not locate /dev/nvidia-caps/nvidia-cap*: pattern /dev/nvidia-caps/nvidia-cap* not found
ERRO[0000] failed to generate CDI spec: failed to create edits common for entities: failed to get CUDA version: failed to locate libcuda.so: pattern libcuda.so.*.*.* not found
@nandlab which version of the toolkit is this? The final error you're seeing should be addressed in the latest version (v1.13.1), but maybe something was missed in the fix for that.
Note I have created https://gitlab.com/nvidia/cloud-native/go-nvlib/-/merge_requests/40 to start working on the initial error you're seeing and will update the NVIDIA Container Toolkit once that is merged.
My installed version of nvidia-container-toolkit-base
is 1.13.1-1 (buster)
.
Actually, looking at your NVIDIA SMI output, I would assume that your libcuda library is libcuda.so.390.157
and not libcuda.so.390.157.x
which is the pattern that we're trying to match. I have created https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/merge_requests/397 which should allow this to proceed. Would you be able to test with a build of this executable?
You should be able to run make docker-cmd-nvidia-ctk
to generate a local nvidia-ctk
binary with the changes for testing purposes.
I tried to make docker-cmd-nvidia-ctk
in the use-major-minor-for-cuda-version
branch but it aborts with:
if [ x"" = x"" ]; then \
docker build \
--progress=plain \
--build-arg GOLANG_VERSION="1.20.3" \
--tag nvidia/container-toolkit-build:golang1.20.3 \
-f docker/Dockerfile.devel \
docker; \
fi
Sending build context to Docker daemon 20.48kB
Step 1/7 : ARG GOLANG_VERSION=x.x.x
Step 2/7 : FROM golang:${GOLANG_VERSION}
---> 4237fa9a9df4
Step 3/7 : RUN go install golang.org/x/lint/golint@6edffad5e6160f5949cdefc81710b2706fbcd4f6
---> Using cache
---> ac387ef1abdf
Step 4/7 : RUN go install github.com/matryer/moq@latest
---> Using cache
---> 1b8cb9c74df0
Step 5/7 : RUN go install github.com/gordonklaus/ineffassign@d2c82e48359b033cde9cf1307f6d5550b8d61321
---> Using cache
---> 60ba1079891b
Step 6/7 : RUN go install github.com/client9/misspell/cmd/misspell@latest
---> Using cache
---> 3c825ab8aa3d
Step 7/7 : RUN go install github.com/google/go-licenses@latest
---> Using cache
---> 96a57dc20a94
Successfully built 96a57dc20a94
Successfully tagged nvidia/container-toolkit-build:golang1.20.3
Running 'make cmd-nvidia-ctk' in docker container nvidia/container-toolkit-build:golang1.20.3
docker run \
--rm \
-e GOCACHE=/tmp/.cache \
-v : \
-w \
--user $(id -u):$(id -g) \
nvidia/container-toolkit-build:golang1.20.3 \
make cmd-nvidia-ctk
docker: Error response from daemon: the working directory '--user' is invalid, it needs to be an absolute path.
See 'docker run --help'.
make: *** [Makefile:141: docker-cmd-nvidia-ctk] Error 125
It looks like the -w
argument to Docker expects a working directory string as argument.
The make target is:
$(DOCKER_TARGETS): docker-%: .build-image
@echo "Running 'make $(*)' in docker container $(BUILDIMAGE)"
$(DOCKER) run \
--rm \
-e GOCACHE=/tmp/.cache \
-v $(PWD):$(PWD) \
-w $(PWD) \
--user $$(id -u):$$(id -g) \
$(BUILDIMAGE) \
make $(*)
meaning that in your case the PWD
envvar / make variable is not set. Could you repeat with:
PWD=$(pwd) make docker-cmd-nvidia-ctk
PWD=$(pwd) make docker-cmd-nvidia-ctk
This way the compilation worked fine.
Output of ./nvidia-ctk cdi generate
(did not change):
INFO[0000] Auto-detected mode as "nvml"
INFO[0000] Selecting /dev/nvidia0 as /dev/nvidia0
INFO[0000] Selecting /dev/dri/card0 as /dev/dri/card0
INFO[0000] Selecting /dev/dri/renderD128 as /dev/dri/renderD128
./nvidia-ctk: symbol lookup error: ./nvidia-ctk: undefined symbol: nvmlDeviceGetMaxMigDeviceCount
Output of ./nvidia-ctk cdi generate --mode=management
(looks good but there are a few warnings):
INFO[0000] Selecting /dev/nvidia-modeset as /dev/nvidia-modeset
INFO[0000] Selecting /dev/nvidia-uvm as /dev/nvidia-uvm
INFO[0000] Selecting /dev/nvidia-uvm-tools as /dev/nvidia-uvm-tools
INFO[0000] Selecting /dev/nvidia0 as /dev/nvidia0
INFO[0000] Selecting /dev/nvidiactl as /dev/nvidiactl
WARN[0000] Could not locate /dev/nvidia-caps/nvidia-cap*: pattern /dev/nvidia-caps/nvidia-cap* not found
INFO[0000] Using driver version 390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libEGL_nvidia.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libEGL_nvidia.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv1_CM_nvidia.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv1_CM_nvidia.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv2_nvidia.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv2_nvidia.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLX_nvidia.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLX_nvidia.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libcuda.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libcuda.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvcuvid.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvcuvid.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-cfg.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-cfg.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-encode.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-encode.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ml.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ml.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ptxjitcompiler.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ptxjitcompiler.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libvdpau_nvidia.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libvdpau_nvidia.so.390.157
WARN[0000] Could not locate /nvidia-persistenced/socket: pattern /nvidia-persistenced/socket not found
WARN[0000] Could not locate /nvidia-fabricmanager/socket: pattern /nvidia-fabricmanager/socket not found
WARN[0000] Could not locate /tmp/nvidia-mps: pattern /tmp/nvidia-mps not found
WARN[0000] Could not locate /lib/firmware/nvidia/390.157/gsp*.bin: pattern /lib/firmware/nvidia/390.157/gsp*.bin not found
INFO[0000] Selecting /usr/bin/nvidia-smi as /usr/bin/nvidia-smi
INFO[0000] Selecting /usr/bin/nvidia-debugdump as /usr/bin/nvidia-debugdump
INFO[0000] Selecting /usr/bin/nvidia-persistenced as /usr/bin/nvidia-persistenced
WARN[0000] Could not locate nvidia-cuda-mps-control: pattern nvidia-cuda-mps-control not found
WARN[0000] Could not locate nvidia-cuda-mps-server: pattern nvidia-cuda-mps-server not found
INFO[0000] Generated CDI spec with version 0.3.0
cdiVersion: 0.3.0
containerEdits:
hooks:
- args:
- nvidia-ctk
- hook
- update-ldcache
- --folder
- /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx
hookName: createContainer
path: /usr/bin/nvidia-ctk
mounts:
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv2_nvidia.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv2_nvidia.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLX_nvidia.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLX_nvidia.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvcuvid.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvcuvid.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-cfg.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-cfg.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ptxjitcompiler.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ptxjitcompiler.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libvdpau_nvidia.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libvdpau_nvidia.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libEGL_nvidia.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libEGL_nvidia.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv1_CM_nvidia.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv1_CM_nvidia.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libcuda.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libcuda.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-encode.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-encode.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ml.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ml.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-smi
hostPath: /usr/bin/nvidia-smi
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-debugdump
hostPath: /usr/bin/nvidia-debugdump
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-persistenced
hostPath: /usr/bin/nvidia-persistenced
options:
- ro
- nosuid
- nodev
- bind
devices:
- containerEdits:
deviceNodes:
- path: /dev/nvidia0
- path: /dev/nvidiactl
- path: /dev/nvidia-modeset
- path: /dev/nvidia-uvm
- path: /dev/nvidia-uvm-tools
name: all
kind: nvidia.com/gpu
Can the warnings be ignored?
Thanks. Those warnings are expected in this case. Thanks for the update.
Note that I have created https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/merge_requests/398 which should fix the nvidia-ctk cdi generate
(with default mode) command. Would you also be able to test that build?
Thank you for the support!
sudo podman run -ti --rm --device=nvidia.com/gpu=0 ubuntu:18.04 nvidia-smi -L
says:
Error: stat nvidia.com/gpu=0: no such file or directory
My Podman version is 3.0.1.
Is my Podman too old for CDI? Can you pass the CDI yaml with a different option?
A podman version of at least 4.1.0
would be required for native CDI support. If this cannot be installed / built from source, is an alternative:
/etc/cdi/nvidia.yaml
by running sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
/etc/cdi/nvidia.yaml
file to be world-readable (this will be addressed in the next release): sudo chmod 644 /etc/cdi/nvidia.yaml
nvidia-container-runtime
to use CDI: change the mode = "auto"
setting in /etc/nvidia-container-runtime/config.toml
to mode = "cdi"
sudo nvidia-ctk runtime configure
and restart the docker daemon: sudo systemctl restart docker
nvidia
runtime. For Docker this would be: docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 {{IMAGE}}
--runtime
: podman run --rm -ti --runtime=/usr/bin/nvidia-container-runtime NVIDIA_VISIBLE_DEVICES=0 {{IMAGE}}
Note that NVIDIA_VISIBLE_DEVICES=0
can also be replaced with NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=0
as the NVIDIA Container Runtime in CDI mode will assume the nvidia.com/gpu
CDI device class by default.
We do need to update our documentation to better describe this process, so please let us know if this is unclear.
Hi, sorry for the late response.
I followed your steps but it still does not work.
sudo podman run --rm -ti --runtime=/usr/bin/nvidia-container-runtime NVIDIA_VISIBLE_DEVICES=0
prints
Error: invalid reference format
and exits with code 125.
sudo docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 ubuntu
prints
docker: Error response from daemon: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=0: unknown.
and exits with code 125.
Instead, sudo docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ubuntu
prints
docker: Error response from daemon: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: failed to inject devices: failed to stat CDI host device "/dev/nvidia-uvm": no such file or directory: unknown.
and exits with code 127.
The device /dev/nvidia-uvm
does indeed not exist on my machine.
I recreated /etc/cdi/nvidia.yaml
with the newest nvidia-ctk
from the container-tools main branch:
INFO[0000] Auto-detected mode as "nvml"
INFO[0000] Selecting /dev/nvidia0 as /dev/nvidia0
INFO[0000] Selecting /dev/dri/card0 as /dev/dri/card0
INFO[0000] Selecting /dev/dri/renderD128 as /dev/dri/renderD128
INFO[0000] Using driver version 390.157
INFO[0000] Selecting /dev/nvidia-modeset as /dev/nvidia-modeset
WARN[0000] Could not locate /dev/nvidia-uvm-tools: pattern /dev/nvidia-uvm-tools not found
WARN[0000] Could not locate /dev/nvidia-uvm: pattern /dev/nvidia-uvm not found
INFO[0000] Selecting /dev/nvidiactl as /dev/nvidiactl
WARN[0000] Could not locate libnvidia-egl-gbm.so: 64-bit library libnvidia-egl-gbm.so not found
INFO[0000] Selecting /usr/share/glvnd/egl_vendor.d/10_nvidia.json as /usr/share/glvnd/egl_vendor.d/10_nvidia.json
INFO[0000] Selecting /usr/share/vulkan/icd.d/nvidia_icd.json as /usr/share/vulkan/icd.d/nvidia_icd.json
INFO[0000] Selecting /usr/share/vulkan/implicit_layer.d/nvidia_layers.json as /usr/share/vulkan/implicit_layer.d/nvidia_layers.json
WARN[0000] Could not locate egl/egl_external_platform.d/15_nvidia_gbm.json: pattern egl/egl_external_platform.d/15_nvidia_gbm.json not found
WARN[0000] Could not locate egl/egl_external_platform.d/10_nvidia_wayland.json: pattern egl/egl_external_platform.d/10_nvidia_wayland.json not found
WARN[0000] Could not locate nvidia/xorg/nvidia_drv.so: pattern nvidia/xorg/nvidia_drv.so not found
WARN[0000] Could not locate nvidia/xorg/libglxserver_nvidia.so.390.157: pattern nvidia/xorg/libglxserver_nvidia.so.390.157 not found
WARN[0000] Could not locate X11/xorg.conf.d/10-nvidia.conf: pattern X11/xorg.conf.d/10-nvidia.conf not found
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libEGL_nvidia.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libEGL_nvidia.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv1_CM_nvidia.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv1_CM_nvidia.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv2_nvidia.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv2_nvidia.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLX_nvidia.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLX_nvidia.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libcuda.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libcuda.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvcuvid.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvcuvid.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-cfg.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-cfg.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-encode.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-encode.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ml.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ml.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ptxjitcompiler.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ptxjitcompiler.so.390.157
INFO[0000] Selecting /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libvdpau_nvidia.so.390.157 as /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libvdpau_nvidia.so.390.157
INFO[0000] Selecting /run/nvidia-persistenced/socket as /run/nvidia-persistenced/socket
WARN[0000] Could not locate /nvidia-fabricmanager/socket: pattern /nvidia-fabricmanager/socket not found
WARN[0000] Could not locate /tmp/nvidia-mps: pattern /tmp/nvidia-mps not found
WARN[0000] Could not locate /lib/firmware/nvidia/390.157/gsp*.bin: pattern /lib/firmware/nvidia/390.157/gsp*.bin not found
INFO[0000] Selecting /usr/bin/nvidia-smi as /usr/bin/nvidia-smi
INFO[0000] Selecting /usr/bin/nvidia-debugdump as /usr/bin/nvidia-debugdump
INFO[0000] Selecting /usr/bin/nvidia-persistenced as /usr/bin/nvidia-persistenced
WARN[0000] Could not locate nvidia-cuda-mps-control: pattern nvidia-cuda-mps-control not found
WARN[0000] Could not locate nvidia-cuda-mps-server: pattern nvidia-cuda-mps-server not found
WARN[0000] Could not locate nvidia/xorg/nvidia_drv.so: pattern nvidia/xorg/nvidia_drv.so not found
WARN[0000] Could not locate nvidia/xorg/libglxserver_nvidia.so.390.157: pattern nvidia/xorg/libglxserver_nvidia.so.390.157 not found
INFO[0000] Generated CDI spec with version 0.5.0
cdiVersion: 0.5.0
containerEdits:
deviceNodes:
- path: /dev/nvidia-modeset
- path: /dev/nvidiactl
hooks:
- args:
- nvidia-ctk
- hook
- update-ldcache
- --folder
- /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx
hookName: createContainer
path: /usr/bin/nvidia-ctk
mounts:
- containerPath: /usr/share/glvnd/egl_vendor.d/10_nvidia.json
hostPath: /usr/share/glvnd/egl_vendor.d/10_nvidia.json
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/share/vulkan/icd.d/nvidia_icd.json
hostPath: /usr/share/vulkan/icd.d/nvidia_icd.json
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/share/vulkan/implicit_layer.d/nvidia_layers.json
hostPath: /usr/share/vulkan/implicit_layer.d/nvidia_layers.json
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv1_CM_nvidia.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv1_CM_nvidia.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv2_nvidia.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLESv2_nvidia.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libcuda.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libcuda.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-cfg.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-cfg.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ml.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ml.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libvdpau_nvidia.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libvdpau_nvidia.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libEGL_nvidia.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libEGL_nvidia.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLX_nvidia.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libGLX_nvidia.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvcuvid.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvcuvid.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-encode.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-encode.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ptxjitcompiler.so.390.157
hostPath: /usr/lib/x86_64-linux-gnu/nvidia/legacy-390xx/libnvidia-ptxjitcompiler.so.390.157
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /run/nvidia-persistenced/socket
hostPath: /run/nvidia-persistenced/socket
options:
- ro
- nosuid
- nodev
- bind
- noexec
- containerPath: /usr/bin/nvidia-smi
hostPath: /usr/bin/nvidia-smi
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-debugdump
hostPath: /usr/bin/nvidia-debugdump
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-persistenced
hostPath: /usr/bin/nvidia-persistenced
options:
- ro
- nosuid
- nodev
- bind
devices:
- containerEdits:
deviceNodes:
- path: /dev/nvidia0
- path: /dev/dri/card0
- path: /dev/dri/renderD128
hooks:
- args:
- nvidia-ctk
- hook
- create-symlinks
- --link
- ../card0::/dev/dri/by-path/pci-0000:01:00.0-card
- --link
- ../renderD128::/dev/dri/by-path/pci-0000:01:00.0-render
hookName: createContainer
path: /usr/bin/nvidia-ctk
- args:
- nvidia-ctk
- hook
- chmod
- --mode
- "755"
- --path
- /dev/dri
hookName: createContainer
path: /usr/bin/nvidia-ctk
name: "0"
- containerEdits:
deviceNodes:
- path: /dev/nvidia0
- path: /dev/dri/card0
- path: /dev/dri/renderD128
hooks:
- args:
- nvidia-ctk
- hook
- create-symlinks
- --link
- ../card0::/dev/dri/by-path/pci-0000:01:00.0-card
- --link
- ../renderD128::/dev/dri/by-path/pci-0000:01:00.0-render
hookName: createContainer
path: /usr/bin/nvidia-ctk
- args:
- nvidia-ctk
- hook
- chmod
- --mode
- "755"
- --path
- /dev/dri
hookName: createContainer
path: /usr/bin/nvidia-ctk
name: all
kind: nvidia.com/gpu
Now I can start a docker container with the nvidia runtime without problems, for example:
sudo docker run -it -h "$HOSTNAME" --ipc=host -e "DISPLAY=$DISPLAY" -v '/tmp/.X11-unix:/tmp/.X11-unix' -v "$HOME/.Xauthority:/root/.Xauthority" --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=all ubuntu
But GUI programs are apparently still software rendered, e.g. glxgears
from mesa-utils
uses 50% of the CPU.
@nandlab great news that you were able to get CDI injection workging. The reason for not using the hardware renderer is most likely due to the missing X libraries as listed in the log:
WARN[0000] Could not locate nvidia/xorg/nvidia_drv.so: pattern nvidia/xorg/nvidia_drv.so not found
WARN[0000] Could not locate nvidia/xorg/libglxserver_nvidia.so.390.157: pattern nvidia/xorg/libglxserver_nvidia.so.390.157 not found
WARN[0000] Could not locate X11/xorg.conf.d/10-nvidia.conf: pattern X11/xorg.conf.d/10-nvidia.conf not found
Where are these located on your system?
Where should I look for these patterns? There are symlinks in many places.
In /usr/lib/nvidia
there are the symlinks nvidia_drv.so
and libglx.so
.
I could not find X11/xorg.conf.d/10-nvidia.conf
Is it possible to use nvidia-container-toolkit with the notebook NVIDIA NVS 5400M GPU on Linux? The latest compatible driver for it is 390.157. It supports up to CUDA 9.1.
If not, is there an older version of nvidia-container-toolkit that will work with this driver?
P.S.: I would like to use a docker container with a gazebo installation with hardware accelerated graphics.