dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
1.88k stars 416 forks source link

building container with ROS:humble desktop and python 3.11 (probably CUDA enabled pytorch but I didn't get that far) #555

Open kyle-redyeti opened 3 weeks ago

kyle-redyeti commented 3 weeks ago

This is my first attempt at using jetson-containers build so I am probably doing something wrong or doing this the hard way, but any direction you can provide will be greatly appreciated

I attempted to "roll my own" nvidia container with the following command: $ jetson-containers build --name=humble_desktop_cpp311 python:3.11 ros:humble-desktop

[It took a long time to build I am not sure if I missed any other errors or warning but it ended with an error...]

"""
Setting up python3-pytest-mock (1.10.4-3) ...

All required rosdeps installed successfully

Traceback (most recent call last): File "/usr/bin/colcon", line 33, in sys.exit(load_entry_point('colcon-core==0.16.1', 'console_scripts', 'colcon')()) File "/usr/lib/python3/dist-packages/colcon_core/command.py", line 118, in main return _main(command_name=command_name, argv=argv) File "/usr/lib/python3/dist-packages/colcon_core/command.py", line 160, in _main args = parser.parse_args(args=argv) File "/usr/lib/python3/dist-packages/colcon_defaults/argument_parser/defaults.py", line 166, in parse_args return self._parser.parse_args(*args, **kwargs) File "/usr/lib/python3/dist-packages/colcon_argcomplete/argument_parser/argcomplete/init.py", line 85, in parse_args from argcomplete import autocomplete ModuleNotFoundError: No module named 'argcomplete' Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 72, in apport_excepthook from apport.fileutils import likely_packaged, get_recent_crashes File "/usr/lib/python3/dist-packages/apport/init.py", line 5, in from apport.report import Report File "/usr/lib/python3/dist-packages/apport/report.py", line 32, in import apport.fileutils File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 12, in import os, glob, subprocess, os.path, time, pwd, sys, requests_unixsocket File "/usr/lib/python3/dist-packages/requests_unixsocket/init.py", line 1, in import requests File "/usr/lib/python3/dist-packages/requests/init.py", line 43, in import urllib3 ModuleNotFoundError: No module named 'urllib3'

Original exception was: Traceback (most recent call last): File "/usr/bin/colcon", line 33, in sys.exit(load_entry_point('colcon-core==0.16.1', 'console_scripts', 'colcon')()) File "/usr/lib/python3/dist-packages/colcon_core/command.py", line 118, in main return _main(command_name=command_name, argv=argv) File "/usr/lib/python3/dist-packages/colcon_core/command.py", line 160, in _main args = parser.parse_args(args=argv) File "/usr/lib/python3/dist-packages/colcon_defaults/argument_parser/defaults.py", line 166, in parse_args return self._parser.parse_args(*args, **kwargs) File "/usr/lib/python3/dist-packages/colcon_argcomplete/argument_parser/argcomplete/init.py", line 85, in parse_args from argcomplete import autocomplete ModuleNotFoundError: No module named 'argcomplete' The command '/bin/bash -c ./ros2_build.sh' returned a non-zero code: 1 Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/xavier/Documents/Git/jetson-containers/jetson_containers/build.py", line 103, in build_container(args.name, args.packages, args.base, args.build_flags, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api) File "/home/xavier/Documents/Git/jetson-containers/jetson_containers/container.py", line 143, in build_container status = subprocess.run(cmd.replace(NEWLINE, ' '), executable='/bin/bash', shell=True, check=True)
File "/usr/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'DOCKER_BUILDKIT=0 docker build --network=host --tag humble_desktop_cpp311:l4t-r35.5.0-ros_humble-desktop --file /home/xavier/Documents/Git/jetson-containers/packages/ros/Dockerfile.ros2 --build-arg BASE_IMAGE=humble_desktop_cpp311:l4t-r35.5.0-cmake --build-arg ROS_VERSION="humble" --build-arg ROS_PACKAGE="desktop" /home/xavier/Documents/Git/jetson-containers/packages/ros 2>&1 | tee /home/xavier/Documents/Git/jetson-containers/logs/20240611_100613/build/humble_desktop_cpp311_l4t-r35.5.0-ros_humble-desktop.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1. """ This is the stack of containers it went through... """ REPOSITORY TAG IMAGE ID CREATED SIZE

954191dd8300 About an hour ago 11.3GB humble_desktop_cpp311 l4t-r35.5.0-cmake 38168db051e3 About an hour ago 11.3GB humble_desktop_cpp311 l4t-r35.5.0-opencv fe47ee392bc4 About an hour ago 11.2GB humble_desktop_cpp311 l4t-r35.5.0-numpy 936e66829051 4 hours ago 10.2GB humble_desktop_cpp311 l4t-r35.5.0-cuda aaec478f221f 4 hours ago 10.2GB humble_desktop_cpp311 l4t-r35.5.0-cudnn aaec478f221f 4 hours ago 10.2GB humble_desktop_cpp311 l4t-r35.5.0-tensorrt aaec478f221f 4 hours ago 10.2GB humble_desktop_cpp311 l4t-r35.5.0-python_3.11 c455d7dbfcfd 4 hours ago 10.2GB eded2d895b58 4 hours ago 10GB humble_desktop_cpp311 l4t-r35.5.0-build-essential 6a61737c35e5 4 hours ago 10GB humble_desktop_cpp310 l4t-r35.5.0-build-essential 6a61737c35e5 4 hours ago 10GB """
kyle-redyeti commented 3 weeks ago

I thought I was tricky and found another way that might work but it also ended with an error

I tried to use this command instead:

$ PYTHON_VERSION=3.11 PYTORCH_VERSION=2.3 jetson-containers build ros:humble-desktop

Looks like the same error here...

0 upgraded, 0 newly installed, 0 to remove and 99 not upgraded.

All required rosdeps installed successfully

Traceback (most recent call last): File "/usr/bin/colcon", line 33, in sys.exit(load_entry_point('colcon-core==0.16.1', 'console_scripts', 'colcon')()) File "/usr/lib/python3/dist-packages/colcon_core/command.py", line 118, in main return _main(command_name=command_name, argv=argv) File "/usr/lib/python3/dist-packages/colcon_core/command.py", line 160, in _main args = parser.parse_args(args=argv) File "/usr/lib/python3/dist-packages/colcon_defaults/argument_parser/defaults.py", line 166, in parse_args return self._parser.parse_args(*args, **kwargs) File "/usr/lib/python3/dist-packages/colcon_argcomplete/argument_parser/argcomplete/init.py", line 85, in parse_args from argcomplete import autocomplete ModuleNotFoundError: No module named 'argcomplete' Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 72, in apport_excepthook from apport.fileutils import likely_packaged, get_recent_crashes File "/usr/lib/python3/dist-packages/apport/init.py", line 5, in from apport.report import Report File "/usr/lib/python3/dist-packages/apport/report.py", line 32, in import apport.fileutils File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 12, in import os, glob, subprocess, os.path, time, pwd, sys, requests_unixsocket File "/usr/lib/python3/dist-packages/requests_unixsocket/init.py", line 1, in import requests File "/usr/lib/python3/dist-packages/requests/init.py", line 43, in import urllib3 ModuleNotFoundError: No module named 'urllib3'

Original exception was: Traceback (most recent call last): File "/usr/bin/colcon", line 33, in sys.exit(load_entry_point('colcon-core==0.16.1', 'console_scripts', 'colcon')()) File "/usr/lib/python3/dist-packages/colcon_core/command.py", line 118, in main return _main(command_name=command_name, argv=argv) File "/usr/lib/python3/dist-packages/colcon_core/command.py", line 160, in _main args = parser.parse_args(args=argv) File "/usr/lib/python3/dist-packages/colcon_defaults/argument_parser/defaults.py", line 166, in parse_args return self._parser.parse_args(*args, **kwargs) File "/usr/lib/python3/dist-packages/colcon_argcomplete/argument_parser/argcomplete/init.py", line 85, in parse_args from argcomplete import autocomplete ModuleNotFoundError: No module named 'argcomplete' The command '/bin/bash -c ./ros2_build.sh' returned a non-zero code: 1 Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/xavier/Documents/Git/jetson-containers/jetson_containers/build.py", line 103, in build_container(args.name, args.packages, args.base, args.build_flags, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api) File "/home/xavier/Documents/Git/jetson-containers/jetson_containers/container.py", line 143, in build_container status = subprocess.run(cmd.replace(NEWLINE, ' '), executable='/bin/bash', shell=True, check=True)
File "/usr/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'DOCKER_BUILDKIT=0 docker build --network=host --tag ros:humble-desktop-l4t-r35.5.0-ros_humble-desktop --file /home/xavier/Documents/Git/jetson-containers/packages/ros/Dockerfile.ros2 --build-arg BASE_IMAGE=ros:humble-desktop-l4t-r35.5.0-cmake --build-arg ROS_VERSION="humble" --build-arg ROS_PACKAGE="desktop" /home/xavier/Documents/Git/jetson-containers/packages/ros 2>&1 | tee /home/xavier/Documents/Git/jetson-containers/logs/20240611_143723/build/ros_humble-desktop-l4t-r35.5.0-ros_humble-desktop.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1.

these are the images it created (very similar to last time)

bad0aa1debb3 55 minutes ago 11.3GB ros humble-desktop-l4t-r35.5.0-cmake 8a3510b2c3a3 55 minutes ago 11.3GB ros humble-desktop-l4t-r35.5.0-opencv 433e848729f4 56 minutes ago 11.2GB ros humble-desktop-l4t-r35.5.0-numpy f0b106eeb093 4 hours ago 10.2GB ros humble-desktop-l4t-r35.5.0-python 06213e637c11 4 hours ago 10.2GB ros humble-desktop-l4t-r35.5.0-tensorrt 06213e637c11 4 hours ago 10.2GB ros humble-desktop-l4t-r35.5.0-cuda a8aeaffc7171 4 hours ago 10GB ros humble-desktop-l4t-r35.5.0-cudnn a8aeaffc7171 4 hours ago 10GB
kyle-redyeti commented 3 weeks ago

I thought maybe my version of jetson-container may be out of date. It seemed to have died somewhere else now...

copying _skbuild/linux-aarch64-3.11/cmake-install/share/opencv4/haarcascades/haarcascade_smile.xml -> _skbuild/linux-aarch64-3.11/cmake-install/cv2/data/haarcascade_smile.xml copying _skbuild/linux-aarch64-3.11/cmake-install/share/opencv4/haarcascades/haarcascade_upperbody.xml -> _skbuild/linux-aarch64-3.11/cmake-install/cv2/data/haarcascade_upperbody.xml Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in main() File "/usr/local/lib/python3.11/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main json_out['return_val'] = hook(*hook_input['kwargs']) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel return _build_backend().build_wheel(wheel_directory, config_settings, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/pip-build-env-l6raaq6r/overlay/local/lib/python3.11/dist-packages/setuptools/build_meta.py", line 230, in build_wheel return self._build_with_temp_dir(['bdist_wheel'], '.whl', ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/pip-build-env-l6raaq6r/overlay/local/lib/python3.11/dist-packages/setuptools/build_meta.py", line 215, in _build_with_temp_dir self.run_setup() File "/tmp/pip-build-env-l6raaq6r/overlay/local/lib/python3.11/dist-packages/setuptools/build_meta.py", line 268, in run_setup self).run_setup(setup_script=setup_script) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/pip-build-env-l6raaq6r/overlay/local/lib/python3.11/dist-packages/setuptools/build_meta.py", line 158, in run_setup exec(compile(code, file, 'exec'), locals()) File "setup.py", line 547, in main() File "setup.py", line 277, in main setup( File "/tmp/pip-build-env-l6raaq6r/overlay/local/lib/python3.11/dist-packages/skbuild/setuptools_wrap.py", line 706, in setup _classify_installed_files( File "setup.py", line 460, in _classify_installed_files_override raise Exception("Not found: '%s'" % relpath_re) Exception: Not found: 'python/cv2/mat_wrapper/..py' error: subprocess-exited-with-error

× Building wheel for opencv-contrib-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. full command: /usr/bin/python3.11 /usr/local/lib/python3.11/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp27y1yieh cwd: /opt/opencv-python Building wheel for opencv-contrib-python (pyproject.toml): finished with status 'error' ERROR: Failed building wheel for opencv-contrib-python Failed to build opencv-contrib-python ERROR: Failed to build one or more wheels The command '/bin/sh -c cd /tmp/opencv && ./install.sh || ./build.sh' returned a non-zero code: 1 Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/xavier/Documents/Git/jetson-containers/jetson_containers/build.py", line 103, in build_container(args.name, args.packages, args.base, args.build_flags, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api) File "/home/xavier/Documents/Git/jetson-containers/jetson_containers/container.py", line 143, in build_container status = subprocess.run(cmd.replace(NEWLINE, ' '), executable='/bin/bash', shell=True, check=True)
File "/usr/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'DOCKER_BUILDKIT=0 docker build --network=host --tag ros:humble-desktop-l4t-r35.5.0-opencv_deb --file /home/xavier/Documents/Git/jetson-containers/packages/opencv/Dockerfile --build-arg BASE_IMAGE=ros:humble-desktop-l4t-r35.5.0-numpy --build-arg OPENCV_VERSION="4.5.0" --build-arg OPENCV_PYTHON="4.x" --build-arg CUDA_ARCH_BIN="7.2,8.7" --build-arg OPENCV_URL="https://nvidia.box.com/shared/static/2hssa5g3v28ozvo3tc3qwxmn78yerca9.gz" /home/xavier/Documents/Git/jetson-containers/packages/opencv 2>&1 | tee /home/xavier/Documents/Git/jetson-containers/logs/20240611_182631/build/ros_humble-desktop-l4t-r35.5.0-opencv_deb.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1.

dusty-nv commented 3 weeks ago

@kyle-redyeti it looks like initially you were running into issues with the colcon build system ROS uses and my guess is that Humble ordinarily sticks to Python 3.10 in their tier 1/2 support. With OpenCV I have had trouble building it against latest CUDA/cuDNN/Python and those errors are similar to what I have been seeing from https://github.com/opencv/opencv/issues/24983

kyle-redyeti commented 2 weeks ago

@dusty-nv OK I have tried several rounds of trying to build this.. I even tried to add ROS2 to the text-generation-webui:r35.4.1-cp310. If I buy a Jetson Orin NX would the ROS2 containers be building on Ubuntu22.04 (Jammy) and I would already start with Python3.10 and a version of pytorch that is CUDA enabled... I think I am at the point were I have exhausted my options and patience with trying to continue to use "OLD HARDWARE" (Jetson Xavier AGX)... If you say it will be smooth sailing I will order one as soon as I see your response... (plus I then can use the NanoLLM stuff that I was not able to run on my xavier either...)