Open ben-xD opened 2 years ago
I had come across https://github.com/NVIDIA/nvidia-docker/issues/825#issuecomment-456198590:
as part of v2 we prevent the container from starting if you have the NVIDIA driver
I am not sure if my issue is related to this. Perhaps @RenaudWasTaken would know? Thanks in advance :)
In general, only the containers packaged for l4t
(in this case l4t-tensorrt) are designed to work on jetson machines, e.g.:
docker run --runtime nvidia -it nvcr.io/nvidia/l4t-tensorrt:r8.0.1-runtime
This is because these containers rely on the host to inject all cuda and other support files into the container at runtime instead of bundling them inside the container image (keeping the container images themselves relatively small). The error you are seeing is because a file already bundled in the container image is attempting to be injected into the container at runtime by the container stack.
That said, you may be able to leverage a new feature of the container support for jetson that limits the set of files ultimately injected into a container to only the base l4t base files. That wy you can run any container build for arm that just needs these base files in order to run.
You can do this by setting the following environment variable when you start the container.
NVIDIA_REQUIRE_JETPACK_HOST_MOUNTS=base-only
i.e.
$ docker run --runtime nvidia -it -e NVIDIA_REQUIRE_JETPACK_HOST_MOUNTS=base-only nvcr.io/nvidia/tensorrt:21.12-py3
Problem
Support for Jetson platforms has been in beta for more than a year. Unfortunately, the following simple command does not work on my Jetson. Fortunately, this is very easy to reproduce, just run this:
Error
You will get:
And let me pick out the juiciest part:
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/id/merged/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer_static_v8.a: file exists: unknown
My details:
Running `nvidia-container-cli -k -d /dev/tty info`
``` -- WARNING, the following logs are for debugging purposes only -- I0128 16:30:01.821611 9072 nvc.c:281] initializing library context (version=0.10.0+jetpack, build=61f57bcdf7aa6e73d9a348a7e36ec9fd94128fb2) I0128 16:30:01.821757 9072 nvc.c:255] using root / I0128 16:30:01.821803 9072 nvc.c:256] using ldcache /etc/ld.so.cache I0128 16:30:01.821874 9072 nvc.c:257] using unprivileged user 1002:1002 I0128 16:30:01.822601 9073 driver.c:134] starting driver service I0128 16:30:01.831030 9072 driver.c:231] driver service terminated with signal 15 nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected ````jetson_release -v`
``` - NVIDIA Jetson AGX Xavier [16GB] * Jetpack 4.5.1 [L4T 32.5.2] * NV Power Mode: MAXN - Type: 0 * jetson_stats.service: active - Board info: * Type: AGX Xavier [16GB] * SOC Family: tegra194 - ID:25 * Module: P2888-0001 - Board: P2822-0000 * Code Name: galen * CUDA GPU architecture (ARCH_BIN): 7.2 * Serial Number: 1421021087906 - Libraries: * CUDA: 10.2.89 * cuDNN: 8.0.0.180 * TensorRT: 7.1.3.0 * Visionworks: 1.6.0.501 * OpenCV: NOT_INSTALLED compiled CUDA: NO * VPI: ii libnvvpi1 1.0.15 arm64 NVIDIA Vision Programming Interface library * Vulkan: 1.2.70 - jetson-stats: * Version 3.1.2 * Works on Python 3.6.9 ```My
uname -a
:`docker version`
``` Client: Version: 20.10.7 API version: 1.41 Go version: go1.13.8 Git commit: 20.10.7-0ubuntu5~18.04.3 Built: Mon Nov 1 01:04:31 2021 OS/Arch: linux/arm64 Context: default Experimental: true Server: Engine: Version: 20.10.7 API version: 1.41 (minimum version 1.12) Go version: go1.13.8 Git commit: 20.10.7-0ubuntu5~18.04.3 Built: Fri Oct 22 00:57:37 2021 OS/Arch: linux/arm64 Experimental: false containerd: Version: 1.5.5-0ubuntu3~18.04.1 GitCommit: runc: Version: 1.0.1-0ubuntu2~18.04.1 GitCommit: docker-init: Version: 0.19.0 GitCommit: ```My `dpkg -l '*nvidia*'`
``` Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-====================================-=======================-=======================-============================================================================= un libgldispatch0-nvidianvidia-container-cli -V
``` cli-version: 1.7.0 lib-version: 0.10.0+jetpack build date: 2021-11-30T19:53+00:00 build revision: f37bb387ad05f6e501069d99e4135a97289faf1f build compiler: aarch64-linux-gnu-gcc-7 7.5.0 build platform: aarch64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections ````/var/log/nvidia-container-runtime.log` **logs**
``` 2022/01/28 18:33:07 Using bundle directory: /run/containerd/io.containerd.runtime.v2.task/moby/9b8ee90ce1ae157106936445c2c429ab33f6293d4a2f9e1c400b60917d597a97 2022/01/28 18:33:07 Using OCI specification file path: /run/containerd/io.containerd.runtime.v2.task/moby/9b8ee90ce1ae157106936445c2c429ab33f6293d4a2f9e1c400b60917d597a97/config.json 2022/01/28 18:33:07 Looking for runtime binary 'docker-runc' 2022/01/28 18:33:07 Runtime binary 'docker-runc' not found: exec: "docker-runc": executable file not found in $PATH 2022/01/28 18:33:07 Looking for runtime binary 'runc' 2022/01/28 18:33:07 Found runtime binary '/usr/sbin/runc' 2022/01/28 18:33:07 Running nvidia-container-runtime 2022/01/28 18:33:07 'create' command detected; modification required 2022/01/28 18:33:07 prestart hook path: /usr/bin/nvidia-container-runtime-hook 2022/01/28 18:33:07 Forwarding command to runtime 2022/01/28 18:33:07 Using bundle directory: 2022/01/28 18:33:07 Using OCI specification file path: config.json 2022/01/28 18:33:07 Looking for runtime binary 'docker-runc' 2022/01/28 18:33:07 Runtime binary 'docker-runc' not found: exec: "docker-runc": executable file not found in $PATH 2022/01/28 18:33:07 Looking for runtime binary 'runc' 2022/01/28 18:33:07 Found runtime binary '/usr/sbin/runc' 2022/01/28 18:33:07 Running nvidia-container-runtime 2022/01/28 18:33:07 No modification required 2022/01/28 18:33:07 Forwarding command to runtime ````dmesg`
``` [782180.143499] docker0: port 1(veth7284103) entered blocking state [782180.143505] docker0: port 1(veth7284103) entered disabled state [782180.143995] device veth7284103 entered promiscuous mode [782180.153901] IPv6: ADDRCONF(NETDEV_UP): veth7284103: link is not ready [782180.579633] eth0: renamed from veth7260057 [782180.602680] IPv6: ADDRCONF(NETDEV_CHANGE): veth7284103: link becomes ready [782180.603100] docker0: port 1(veth7284103) entered blocking state [782180.603108] docker0: port 1(veth7284103) entered forwarding state [782185.791137] docker0: port 1(veth7284103) entered disabled state [782185.791615] veth7260057: renamed from eth0 [782185.854805] docker0: port 1(veth7284103) entered disabled state [782185.864577] device veth7284103 left promiscuous mode [782185.864587] docker0: port 1(veth7284103) entered disabled state ```nvidia-container-cli -k -d /dev/tty info
uname -a
dmesg
nvidia-smi -a
- this is not available on Jetsondocker version
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
nvidia-container-cli -V
Things I found
nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected
- I'm not sure why not devices are found. It is aNVIDIA Jetson AGX Xavier [16GB] - Jetpack 4.5.1 [L4T 32.5.2]