Closed mjemv closed 3 years ago
The setup is a bit simpler than it was when Pablo wrote that blog post, I guess. You don't have to explicitly include the -container-csv
pacakages - they'll be pulled in automatically by the packages they're associated with. So something like this:
IMAGE_INSTALL_append = " nvidia-docker cudnn tensorrt libvisionworks libvisionworks-sfm libvisionworks-tracking cuda-libraries"
should work better. The exact set of packages you need to add will depend on which NGC container you intend to run.
With these, I am getting fetch errors with a lot of cuda packages with release, seems like the path has been moved https://repo.download.nvidia.com/jetson/common/pool/main/c/cuda/ does not exist.
I am using the branch dunfell-l4t-r32.4.3
ERROR: cuda-cuobjdump-10.2.89-1-r0 do_fetch: Fetcher failure for URL: 'https://repo.download.nvidia.com/jetson/common/pool/main/c/cuda/cuda-cuobjdump-10-0_10.2.89-1_arm64.deb;name=main;subdir=cuda-cuobjdump-10.2.89-1'
Can you please check
I saw the same, but now it's working for me. I suspect this was a problem at NVIDIA's end, probably due to their pushing out a new release. Give it another try.
should i set CUDA_VERSION to 10.2?
So I was able to build. But do nvidia packages take so much space build_tegra$ ls tmp/deploy/images/jetson-nano-qspi-sd/ -hl | grep flash -rw-r--r-- 1 ubuntu ubuntu 655M Oct 24 14:05 core-image-minimal-jetson-nano-qspi-sd-20201024081103.tegraflash.tar.gz -rw-r--r-- 2 ubuntu ubuntu 83M Oct 24 14:36 core-image-minimal-jetson-nano-qspi-sd-20201024090633.tegraflash.tar.gz
The later is the one without IMAGE_INSTALL_append = " nvidia-docker cudnn tensorrt libvisionworks libvisionworks-sfm libvisionworks-tracking cuda-libraries"
should i set CUDA_VERSION to 10.2?
You shouldn't have to. It will get set automatically.
But do nvidia packages take so much space
Yes, they run quite large.
Getting error with libnvinfer
0;root@1fce794aad39: /jetson-inference/build/aarch64/binroot@1fce794aad39:/jetson-inference/build/aarch64/bin# ls -l /usr/lib/aarch64-linux-gnu/libnvinfer* lrwxrwxrwx 1 root root 19 Oct 27 19:46 /usr/lib/aarch64-linux-gnu/libnvinfer.so -> libnvinfer.so.7.1.3 lrwxrwxrwx 1 root root 19 Oct 27 19:46 /usr/lib/aarch64-linux-gnu/libnvinfer.so.7 -> libnvinfer.so.7.1.3 -rw-r--r-- 1 root root 0 Jul 1 20:05 /usr/lib/aarch64-linux-gnu/libnvinfer.so.7.1.3 lrwxrwxrwx 1 root root 26 Oct 27 19:46 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so -> libnvinfer_plugin.so.7.1.3 lrwxrwxrwx 1 root root 26 Oct 27 19:46 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7 -> libnvinfer_plugin.so.7.1.3 -rw-r--r-- 1 root root 0 Jul 1 20:05 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7.1.3
The zero length for the libraries certainly doesn't look right.
What does ls /usr/lib/libnvinfer*
look like outside the container? Are the libraries and symlinks present? Since there isn't any documentation with that container on what its specific dependencies are, you're going to have to track them down yourself and ensure that all of the packages (including -dev
packages) are installed in your image. Enabling debug logging for the container runtime could help with this - try uncommenting the debug
lines in /etc/nvidia-container-runtime/config.toml
and see if the log files it generates helps with identifying missing mappings.
Is there a container which you are able to run on yocto which uses nvidia gpu, I am badly trying to find a sample to test to check that my headless yocto can be used for some kind of object recognition / AI.
The ones I typically test with are the L4T-Base and DeepStream-L4T containers from NVIDIA's NGC catalog, running them in a demo-image-full
image built from our reference distro.
It's not what you are asking for but we do have some example containers using L4T for image sensor access inside a container at https://gitlab.com/boulderai/bai-edge-sdk in case that's useful. We don't have object recognition/AI examples yet but plan to add those in the future.
I was trying to run l4t-tensorflow with demo-image-full, but am getting the error
root@jetson-nano-qspi-sd:~# docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf1.15-py3
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused " process_linux.go:413: running prestart hook 0 caused \ "error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/c221fd0c4c06bd899c4650f35d4578556a5500fff952a3956b954948ab1dae27/merged/etc/vulkan/icd.d/nvidia_icd.json: file exists\n\""": unknown.
any idea what could be wrong here ? Is there a way to enable more verbose logging
I was trying to run l4t-tensorflow with demo-image-full, but am getting the error
root@jetson-nano-qspi-sd:~# docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf1.15-py3
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused " process_linux.go:413: running prestart hook 0 caused "error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/c221fd0c4c06bd899c4650f35d4578556a5500fff952a3956b954948ab1dae27/merged/etc/vulkan/icd.d/nvidia_icd.json: file exists\n""": unknown.
any idea what could be wrong here ? Is there a way to enable more verbose logging
Surprisingly, If i build the same model container from source it works docker images REPOSITORY TAG IMAGE ID CREATED SIZE l4t-tensorflow r32.4.3-tf2.2-py3 07efcbb28832 51 minutes ago 2.45GB l4t-tensorflow r32.4.3-tf1.15-py3 f1f1704425aa 2 hours ago 2.09GB
what could change when building locally vs running the stock ngc model ?
The filesystem layout is different between stock L4T and OE/Yocto builds, and the error you're seeing when using the problematic containers is due to L4T using symlink for /etc/vulkan/icd.d/nvidia_icd.json
whereas in our builds we just drop the actual JSON file in that location. The containers are also including the L4T-style symlink for some reason, rather than taking advantage of the runtime's automatic passthrough, and that symlink conflicts with the actual file we're passing through when the overlay filesystem is being composed for the container.
I think the least awful fix for this is to match the symlink for that file in our builds make it compatible. That will add a /usr/lib/aarch64-linux-gnu/tegra
directory in the root filesystem, which is kind of ugly, but it appears the Vulkan loader only looks for the JSON file in can't handle zero-length JSON files, which would be present in the container if we were to relocate our copy of the file to a different directory in its search path./etc/vulkan/icd.d
There's a similar issue with the libglvnd config files, but that library searches for its config files in multiple locations, and we install our config in a different place than L4T does, so there's no conflict - just some symlinks and 0-length files visible in the container that don't cause any errors.
Hi,
I am trying to build a yocto image for jetson nano, with doctor-ce and having support of nvidia-container-tools. Stumbled upon a guide https://blogs.windriver.com/wind_river_blog/2020/05/nvidia-container-runtime-for-wind-river-linux/
I am not using WR linux and my bblayers looks like
local.conf
Nothing RPROVIDES 'cuda-container-csv'
Not sure what I am missing. If there are any steps please let me know.