At this time, there is no Yocto recipe for bitbaking nvidia-container-runtime into a Yocto build. I've tried installing the .deb packages from the NVIDIA SDK-manager but this is apparently not enough to get the GPU access through to the containers, as running any CUDA-10 example in a docker container running nvidia-container-runtime will result in an error stating that no CUDA ready devices were found. Running an example outside of the container runs without problems.

Looking in the rootfs the default Nvidia Ubuntu that is created from the SDK-manager, i can see there is a lot of libraries and configurations that is not available in Yocto. Specifically I've come across the csv files in /etc/nvidia-container-runtime/host-files-for-container.d/. This looks promising as they list a ton of stuff that are merged into a container that is run with nvidia-container-runtime.

Only the l4t.csv file contains several "not found" files. As many as possible were installed through Yocto recipes, but a lot of them are still lacking and not supported in Yocto. Instead, the missing files that could not be provided from Yocto were removed from the list, one by one. Eventually though, the error received when trying to run the docker with nvidia-container-runtime is:

$ docker run -it --rm --runtime nvidia --gpus all nvcr.io/nvidia/l4t-base:r32.3.1
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:413: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: src: /usr/lib/libcudnn.so.7, src_lnk: libcudnn.so.7.6.3, dst: /var/lib/docker/overlay2/9b07e768b59dce6c3cd1b3b94a8019978b9bc24d84511bd37d82679efc94b829/merged/usr/lib/libcudnn.so.7, dst_lnk: libcudnn.so.7.6.3\\\\nsrc: /usr/lib/libnvcaffe_parser.so.6, src_lnk: libnvparsers.so.6.0.1, dst: /var/lib/docker/overlay2/9b07e768b59dce6c3cd1b3b94a8019978b9bc24d84511bd37d82679efc94b829/merged/usr/lib/libnvcaffe_parser.so.6, dst_lnk: libnvparsers.so.6.0.1\\\\nsrc: /usr/lib/libnvcaffe_parser.so.6.0.1, src_lnk: libnvparsers.so.6.0.1, dst: /var/lib/docker/overlay2/9b07e768b59dce6c3cd1b3b94a8019978b9bc24d84511bd37d82679efc94b829/merged/usr/lib/libnvcaffe_parser.so.6.0.1, dst_lnk: libnvparsers.so.6.0.1\\\\nsrc: /usr/lib/libnvinfer.so.6, src_lnk: libnvinfer.so.6.0.1, dst: /var/lib/docker/overlay2/9b07e768b59dce6c3cd1b3b94a8019978b9bc24d84511bd37d82679efc94b829/merged/usr/lib/libnvinfer.so.6, dst_lnk: libnvinfer.so.6.0.1\\\\nsrc: /usr/lib/libnvinfer_plugin.so.6, src_lnk: libnvinfer_plugin.so.6.0.1, dst: /var/lib/docker/overlay2/9b07e768b59dce6c3cd1b3b94a8019978b9bc24d84511bd37d82679efc94b829/merged/usr/lib/libnvinfer_plugin.so.6, dst_lnk: libnvinfer_plugin.so.6.0.1\\\\nsrc: /usr/lib/libnvonnxparser.so.6, src_lnk: libnvonnxparser.so.6.0.1, dst: /var/lib/docker/overlay2/9b07e768b59dce6c3cd1b3b94a8019978b9bc24d84511bd37d82679efc94b829/merged/usr/lib/libnvonnxparser.so.6, dst_lnk: libnvonnxparser.so.6.0.1\\\\nsrc: /usr/lib/libnvonnxparser_runtime.so.6, src_lnk: libnvonnxparser_runtime.so.6.0.1, dst: /var/lib/docker/overlay2/9b07e768b59dce6c3cd1b3b94a8019978b9bc24d84511bd37d82679efc94b829/merged/usr/lib/libnvonnxparser_runtime.so.6, dst_lnk: libnvonnxparser_runtime.so.6.0.1\\\\nsrc: /usr/lib/libnvparsers.so.6, src_lnk: libnvparsers.so.6.0.1, dst: /var/lib/docker/overlay2/9b07e768b59dce6c3cd1b3b94a8019978b9bc24d84511bd37d82679efc94b829/merged/usr/lib/libnvparsers.so.6, dst_lnk: libnvparsers.so.6.0.1\\\\n, stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --compat32 --graphics --utility --video --display --pid=6358 /var/lib/docker/overlay2/9b07e768b59dce6c3cd1b3b94a8019978b9bc24d84511bd37d82679efc94b829/merged]\\\\nnvidia-container-cli: mount error: (null)\\\\n\\\"\"": unknown.
ERRO[0003] error waiting for container: context cancelled

nvidia-container-cli: mount error: (null) is an error i cannot seem to figure out how to solve.

Below are information from the build

OS: Linux nano 4.9.140-l4t-r32.3.1+g47e7e1c #1 SMP PREEMPT Mon Jan 20 08:52:22 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

YOCTO INFO: Yocto/poky zeus r32.3.1 (JetPack 4.3)


I am stuck in getting this to work as I've tried almost anything I can think of. Does anyone know anything about getting Nvidia-container-runtime to work with Yocto?

arnoldfychen commented 4 years ago

after install nvidia-repo-l4t,have a try installing nvidia-container-csv-cuda and nvidia-container-csv-cudnn to ensure container can use cuda and cudnn on host

arnoldfychen commented 4 years ago

for building nvidia docker on yocoto, maybe you can get some useful info this page : https://blogs.windriver.com/wind_river_blog/2020/05/nvidia-container-runtime-for-wind-river-linux

elezar commented 1 year ago

The NVIDIA Container Toolkit has been updated with better support for Tegra-based systems. Note that these still require the CSV files that are defined by a platform vendor to function.

Please try with an up to date NVIDIA Container Toolkit version and if problems persist, create an issue against the NVIDIA/nvidia-container-toolkit repository.