Open rahulswa08 opened 1 year ago
Hi @rahulswa08, on JetPack 5, CUDA/cuDNN/TensorRT/ect are installed inside the container (unlike JetPack 4, where they get mounted into the container from the host device by the NVIDIA runtime). So you would just perform the upgrade inside the container. I've not tried changing the CUDA version before though.
Thanks @dusty-nv , As the docker have its own CUDA I have tried upgrading the CUDA on docker using the instructions provided here.
Please ensure your device is configured per the [CUDA Tegra Setup Documentation](https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html#upgradable-package-for-jetson).
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/arm64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-tegra-repo-ubuntu2004-11-8-local_11.8.0-1_arm64.deb
sudo dpkg -i cuda-tegra-repo-ubuntu2004-11-8-local_11.8.0-1_arm64.deb
sudo cp /var/cuda-tegra-repo-ubuntu2004-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
But when I perform the update I face the following issues at the last step sudo apt-get -y install cuda
:
The following packages have unmet dependencies:
cuda : Depends: cuda-11.8 (>= 11.8) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
I'm not sure why I'm facing this. I'm able to perform update on jetson by following the same steps but not able to do on docker.
Am I doing anything wrong here? or it this a limitation?
Could you help me solve this issue
Thanks!!
@rahulswa08 can you try installing the cuda-11.8
package instead of cuda
? Or maybe try the --only-upgrade
flag to apt-get? I haven't upgraded CUDA before in the containers.
I have tried installing cuda-11.8
but it leads to some other dependency and that leads to another. And I'm unable to update it by trying to install them. I haven't tried --only-upgrade
option.
If --only-upgrade
doesn't work and you are unable to resolve the dependencies, you could try uninstalling the previous CUDA from the container first. Or it may be cleaner for you just to start with l4t-base, then install your desired CUDA Toolkit/ect on top of that, then PyTorch and so on.
I've encountered the same issues, starting from each of: nvcr.io/nvidia/l4t-cuda:11.4.19-devel, nvcr.io/nvidia/l4t-cuda:11.4.19-runtime, nvcr.io/nvidia/l4t-base:35.4.1, nvcr.io/nvidia/l4t-base:35.3.1 and nvcr.io/nvidia/l4t-base:35.2.1 when following h documented procedure found here: https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Linux&target_arch=aarch64-jetson&Compilation=Native&Distribution=Ubuntu&target_version=20.04&target_type=deb_local
Having tested both network and local repo methodologies, the network repo seems to be targeted toward the the muli-platform CUDA images for example https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags as evryhing is cross-dependant on cuda-12.2 packages (essentially a documentation issue for the above webpage) bu when pinned to cuda 11.8, he behavior is the same as with the local repo methodology wherein you get circular dependencies among the various CUDA packages at 11.8. So far I've not tested h various force or ignore dependencies approaches as hey would inevitably lead to unstable images. Certainly the preferred approach would be to resolve he underlying circular dependency issue.
As it turns out, the dependency tree ends at he unresolvable dependency on nvidia-l4t-core which is a board suppor package mean for he hos hardware, not containers. The dependency itself seems o be a holdover from the Jepack 4.5.x days when CUDA was meant to run outside the containers. The issue might be resolvable by correcting and rebuilding cuda-compat-11-8
For reference, the (consolidated) tree looks like this:
# apt-get install cuda-11.8
cuda-11-8 : Depends: cuda-runtime-11-8 (>= 11.8.0) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
# apt-get install cuda-runtime-11-8
cuda-runtime-11-8 : Depends: cuda-compat-11-8 (>= 11.8.31339915) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
# apt-get install cuda-compat-11.8
cuda-compat-11-8 : PreDepends: nvidia-l4t-core but it is not installable
E: Unable to correct problems, you have held broken packages.
Further discussion of his issue related to nvidia-l4t-core
(while no directly on point) can be found here https://forums.developer.nvidia.com/t/installing-nvidia-l4t-core-package-in-a-docker-layer/153412
As it turns out, the dependency tree ends at he unresolvable dependency on nvidia-l4t-core which is a board suppor package mean for he hos hardware, not containers. The dependency itself seems o be a holdover from the Jepack 4.5.x days when CUDA was meant to run outside the containers. The issue might be resolvable by correcting and rebuilding cuda-compat-11-8
For reference, the (consolidated) tree looks like this:
# apt-get install cuda-11.8 cuda-11-8 : Depends: cuda-runtime-11-8 (>= 11.8.0) but it is not going to be installed E: Unable to correct problems, you have held broken packages. # apt-get install cuda-runtime-11-8 cuda-runtime-11-8 : Depends: cuda-compat-11-8 (>= 11.8.31339915) but it is not going to be installed E: Unable to correct problems, you have held broken packages. # apt-get install cuda-compat-11.8 cuda-compat-11-8 : PreDepends: nvidia-l4t-core but it is not installable E: Unable to correct problems, you have held broken packages.
Further discussion of his issue related to
nvidia-l4t-core
(while no directly on point) can be found here https://forums.developer.nvidia.com/t/installing-nvidia-l4t-core-package-in-a-docker-layer/153412
https://hackmd.io/ZmWQz8azTdWNVoCc9Bf3QA If not wait for jetpack 6 end of the month
@johnnynunez Congratulations on your article, but it doesn't seem to address the issue at hand - that being deploying CUDA 11.8 INSIDE a container.
If not wait for jetpack 6 end of the month
I'm also a bit baffled by your assertion that the release of Jetpack 6 might include recompilation and correction of the dependency flaw, especially since no such recompilation was completed as part of the Jetpack 5.x roadmap. If you have information that this differs for the 6.0 release, please share that documented roadmap.
@johnnynunez Congratulations on your article, but it doesn't seem to address the issue at hand - that being deploying CUDA 11.8 INSIDE a container.
If not wait for jetpack 6 end of the month
I'm also a bit baffled by your assertion that the release of Jetpack 6 might include recompilation and correction of the dependency flaw, especially since no such recompilation was completed as part of the Jetpack 5.x roadmap. If you have information that this differs for the 6.0 release, please share that documented roadmap.
Only @dusty-nv OR @tokk-nv can confirm somethings here.
So we can only wait for Jetpack 6.0 because:
I do not work in Nvidia, but I think the idea of Nvidia, is to pass the jetson as if it were a gpu, being able to install open dkms kernel and have precompilations of cudnn and other libraries on the order of the day as have other devices such as Grace Hopper (based on ARM)
@johnnynunez @hillct here is another thread to keep an eye on: https://forums.developer.nvidia.com/t/use-cuda-12-2-in-a-container/271600
OK, I found a workaround for this by manually extracting the cuda-compat
deb inside the container, and then installing cuda-toolkit
or cuda-libraries
package instead (only cuda
and cuda-runtime
depend on cuda-compat/nvidia-l4t-core
)
#
# sudo docker build --network=host --tag cuda:12.2 .
# sudo docker run --runtime nvidia -it --rm --network host cuda:12.2 cuda-samples/bin/aarch64/linux/release/deviceQuery
#
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y --no-install-recommends \
wget \
git \
binutils \
xz-utils \
ca-certificates \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# download the CUDA Toolkit local installer
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/arm64/cuda-ubuntu2004.pin -O /etc/apt/preferences.d/cuda-repository-pin-600 && \
wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda-tegra-repo-ubuntu2004-12-2-local_12.2.2-1_arm64.deb && \
dpkg -i cuda-tegra-repo-*.deb && \
rm cuda-tegra-repo-*.deb
# add the signed keys
RUN cp /var/cuda-tegra-repo-*/cuda-tegra-*-keyring.gpg /usr/share/keyrings/
# manually extract cuda-compat
RUN mkdir /var/cuda-compat && \
cd /var/cuda-compat && \
ar x ../cuda-tegra-repo-*/cuda-compat-*.deb && \
tar xvf data.tar.xz -C / && \
rm -rf /var/cuda-compat
# install cuda-toolkit (doesn't depend on cuda-compat/nvidia-l4t-core)
RUN apt-get update && \
apt-get install -y --no-install-recommends \
cuda-toolkit-* \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# environment variables
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=all
ENV CUDA_HOME="/usr/local/cuda"
ENV PATH="/usr/local/cuda/bin:${PATH}"
ENV LD_LIBRARY_PATH="/usr/local/cuda/compat:/usr/local/cuda/lib64:${LD_LIBRARY_PATH}"
# build cuda samples
RUN git clone --branch=v12.2 https://github.com/NVIDIA/cuda-samples && \
cd cuda-samples/Samples/1_Utilities/deviceQuery && \
make
WORKDIR /
Tried this on a board running JetPack 5.1.2 / L4T R35.4.1, which did not have CUDA 12.2 installed outside the container - and it worked (YMMV)
Thank you very much @dusty-nv it worked on the first try, and no problems were encountered.
Just for completeness, in case others come across this issue in the future, the alternate approach is to force the installation of the dependency as in this example. It should be noted you can specify CUDA=11-8 or CUDA=12-2 to get the desired resuls a build time.
ARG BASE_IMAGE=nvcr.io/nvidia/l4t-base:35.3.1
FROM ${BASE_IMAGE} as base
ARG DEBIAN_FRONTEND=noninteractive
ARG sm=87
ARG USE_DISTRIBUTED=1 # skip setting this if you want to enable OpenMPI backend
ARG USE_QNNPACK=0
ARG CUDA=11-8
# nvidia-l4t-core is a dependency for the rest
# of the packages, and is designed to be installed directly
# on the target device. This because it parses /proc/device-tree
# in the deb's .preinst script. Looks like we can bypass it though:
RUN \
echo "deb https://repo.download.nvidia.com/jetson/common r35.3 main" >> /etc/apt/sources.list && \
echo "deb https://repo.download.nvidia.com/jetson/t194 r35.3 main" >> /etc/apt/sources.list && \
apt-key adv --fetch-key http://repo.download.nvidia.com/jetson/jetson-ota-public.asc && \
mkdir -p /opt/nvidia/l4t-packages/ && \
touch /opt/nvidia/l4t-packages/.nv-l4t-disable-boot-fw-update-in-preinstall && \
rm -f /etc/ld.so.conf.d/nvidia-tegra.conf && apt-get update && \
apt-get install -y --no-install-recommends nvidia-l4t-core && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/arm64/cuda-keyring_1.0-1_all.deb && \
dpkg -i cuda-keyring_1.0-1_all.deb && apt-get update && apt-get install -y --no-install-recommends cuda-${CUDA} && \
apt-get -y upgrade && apt-get clean && rm -rf /var/lib/apt/lists/* cuda-keyring_1.0-1_all.deb
I've not yet done a comparison of the final images bu given the methodology, it's likely that @dusty-nv's approach would be slower to build (owing to the large download requirement) but of similar final size
Hello,
I have a board running JetPack 5.1.4 / L4T R35.4.1. I am working on a project that requires Python 3.9 and Cuda 12.2. I can get @dusty-nv solution working and I can get Pytorch installed. However, when I checked for the presence of the GPU, using torch.cuda.is_available()
, it returns None
. The same is true when the setup script checks for $CUDA_HOME
. Some of my dependencies required these to compile.
So far, my steps have been to create the container image as indicated by dusty and then use this image to create the container with Python 3.9 and the rest of my project.
Any help and or advice is greatly appreciated.
Hi @dusty-nv , I'm currently using
ros:noetic-pytorch-l4t-r34.1.1
base image on Jetson AGX Orin 32GB with cuda version 11.4 installed. However I need cuda version 11.8 in my docker, for this do I need to upgrade cuda on Jetson? Or can I perform upgrade to 11.8- cuda on this image?