dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.36k stars 481 forks source link

build error #728

Open HarrisonBT opened 3 days ago

HarrisonBT commented 3 days ago

When I run: jetson-containers build --name=my_container pytorch transformers ros:humble-desktop

It ends in error:

The command '/bin/sh -c /tmp/pytorch/install.sh || /tmp/pytorch/build.sh' returned a non-zero code: 1 Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/home/user/jetson-containers/jetson_containers/build.py", line 112, in <module> build_container(args.name, args.packages, args.base, args.build_flags, args.build_args, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api, args.skip_packages) File "/home/user/jetson-containers/jetson_containers/container.py", line 147, in build_container status = subprocess.run(cmd.replace(_NEWLINE_, ' '), executable='/bin/bash', shell=True, check=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/miniconda3/lib/python3.12/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'sudo DOCKER_BUILDKIT=0 docker build --network=host --tag my_container:l4t-r36.3.0-pytorch_2.2 --file /home/user/jetson-containers/packages/pytorch/Dockerfile --build-arg BASE_IMAGE=my_container:l4t-r36.3.0-onnx --build-arg TORCH_CUDA_ARCH_ARGS="8.7" --build-arg TORCH_VERSION="2.2" --build-arg PYTORCH_BUILD_VERSION="2.2.0" --build-arg USE_NCCL="1" /home/user/jetson-containers/packages/pytorch 2>&1 | tee /home/user/jetson-containers/logs/20241119_153553/build/my_container_l4t-r36.3.0-pytorch_2.2.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1.

dusty-nv commented 3 days ago

Hi @HarrisonBT, can you post higher up in the build log where the actual error might be printed? This should also have automatically gotten saved under jetson-containers/logs

HarrisonBT commented 19 hours ago

Hi @dusty-nv, It was to do with my miniconda installation I think. As soon as I removed the initialization of the base environment from the .bashrc the jetson container build pytorch transformers ros:humble-desktop did complete.

However, I now notice another problem the ros2 humble desktop build l4t-r36.3.0 must not be building correctly as I can see ros2 topics on ros2 topic list, but echo there is nothing coming through. Even when I try set up minimal subscriber in python nothing is hitting the callback.

So now I'm looking at using --base=dustynv/ros:humble-desktop-l4t-r36.2.0 as the base image and adding transformers. (I confirmed your pre built 36.2.0 is working).

After just trying this jetson-containers build --base=dustynv/ros:humble-desktop-l4t-r36.2.0 --name=humble_transformers2 transformers

We get an error during the build Step 6/7 : RUN echo "Downloading ${CUDNN_DEB}" && mkdir /tmp/cudnn && cd /tmp/cudnn && wget --quiet --show-progress --progress=bar:force:noscroll ${CUDNN_URL} && dpkg -i *.deb && cp /var/cudnn-local-tegra-repo-*/cudnn-local-tegra-*-keyring.gpg /usr/share/keyrings/ && apt-get update && apt-cache search cudnn && apt-get install -y --no-install-recommends ${CUDNN_PACKAGES} && rm -rf /var/lib/apt/lists/* && apt-get clean && dpkg --list | grep cudnn && dpkg -P ${CUDNN_DEB} && rm -rf /tmp/cudnn ---> Running in ade28116be0f Downloading cudnn-local-tegra-repo-ubuntu2204-8.9.4.25 mkdir: cannot create directory ‘/tmp/cudnn’: File exists The command '/bin/bash -c echo "Downloading ${CUDNN_DEB}" && mkdir /tmp/cudnn && cd /tmp/cudnn && wget --quiet --show-progress --progress=bar:force:noscroll ${CUDNN_URL} && dpkg -i *.deb && cp /var/cudnn-local-tegra-repo-*/cudnn-local-tegra-*-keyring.gpg /usr/share/keyrings/ && apt-get update && apt-cache search cudnn && apt-get install -y --no-install-recommends ${CUDNN_PACKAGES} && rm -rf /var/lib/apt/lists/* && apt-get clean && dpkg --list | grep cudnn && dpkg -P ${CUDNN_DEB} && rm -rf /tmp/cudnn' returned a non-zero code: 1 Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/user/jetson-containers/jetson_containers/build.py", line 112, in <module> build_container(args.name, args.packages, args.base, args.build_flags, args.build_args, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api, args.skip_packages) File "/home/user/jetson-containers/jetson_containers/container.py", line 147, in build_container status = subprocess.run(cmd.replace(_NEWLINE_, ' '), executable='/bin/bash', shell=True, check=True) File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'DOCKER_BUILDKIT=0 docker build --network=host --tag humble_transformers2:r36.3.0-cudnn --file /home/user/jetson-containers/packages/cuda/cudnn/Dockerfile --build-arg BASE_IMAGE=humble_transformers2:r36.3.0-cuda_12.2 --build-arg CUDNN_URL="https://nvidia.box.com/shared/static/ht4li6b0j365ta7b76a6gw29rk5xh8cy.deb" --build-arg CUDNN_DEB="cudnn-local-tegra-repo-ubuntu2204-8.9.4.25" --build-arg CUDNN_PACKAGES="libcudnn*-dev libcudnn*-samples" /home/user/jetson-containers/packages/cuda/cudnn 2>&1 | tee /home/user/jetson-containers/logs/20241122_094232/build/humble_transformers2_r36.3.0-cudnn.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1.

I've added a -p to the jetson-containers/packages/cuda/cudnn/Dockerfile line 16. mkdir -p /tmp/cudnn && cd /tmp/cudnn && \

Which seems to have allowed through to the next stage.

H