dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.09k stars 435 forks source link

Build with ROS Humble and Pytorch fails subprocess-exited-with-error #325

Closed yuataka closed 9 months ago

yuataka commented 9 months ago

I tried to build image with ROS Humble and Pytorch on Jetson Orin Nano. However, it failed.

L4T 35.4.1

When I run ./build.sh --name=ros_humble_test ros:humble-desktop pytorch, the following error happened.

  × Building wheel for onnx (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpli_l44nj
  cwd: /tmp/pip-req-build-b6s7_y5y
  Building wheel for onnx (pyproject.toml): finished with status 'error'
  ERROR: Failed building wheel for onnx
Failed to build onnx
ERROR: Could not build wheels for onnx, which is required to install pyproject.toml-based projects
The command '/bin/bash -c pip3 install --no-cache-dir --verbose git+https://github.com/onnx/onnx@${ONNX_VERSION}' returned a non-zero code: 1

Logs:

ros_humble_test_r35.4.1-onnx.sh

#!/usr/bin/env bash

docker build --network=host --tag ros_humble_test:r35.4.1-onnx \
--file /home/araya/projects/jetson-containers/packages/onnx/Dockerfile \
--build-arg BASE_IMAGE=ros_humble_test:r35.4.1-ros_humble-desktop \
--build-arg ONNX_VERSION="main" \
/home/araya/projects/jetson-containers/packages/onnx \
2>&1 | tee /home/araya/projects/jetson-containers/logs/20231107_173445/build/ros_humble_test_r35.4.1-onnx.txt; exit ${PIPESTATUS[0]}
~                                                         

ros_humble_test_r35.4.1-onnx.txt

Could you give me some advise about this problem?

dusty-nv commented 9 months ago

Hi @yuataka, I just rebuilt onnx package a few days ago and didn't see this, but it's possible that onnx main branch is having build problems at the moment. So you may want to roll back ONNX_VERSION here:

https://github.com/dusty-nv/jetson-containers/blob/7f2a9dcc116ccf14e5b95776019e0a10cddf9336/packages/onnx/config.py#L6

yuataka commented 9 months ago

Hi @dusty-nv , thank you for your advice. I changed ONNX_VESION from 'main' to 'v1.15.0'.

    package['build_args'] = {'ONNX_VERSION': 'v1.15.0'}

However, build failed. Should I try other onnx version? Thanks for your advice.

ros_humble_test_r35.4.1-onnx.sh

#!/usr/bin/env bash

docker build --network=host --tag ros_humble_test:r35.4.1-onnx \
--file /home/araya/projects/jetson-containers/packages/onnx/Dockerfile \
--build-arg BASE_IMAGE=ros_humble_test:r35.4.1-ros_humble-desktop \
--build-arg ONNX_VERSION="v1.15.0" \
/home/araya/projects/jetson-containers/packages/onnx \
2>&1 | tee /home/araya/projects/jetson-containers/logs/20231109_134938/build/ros_humble_test_r35.4.1-onnx.txt; exit ${PIPESTATUS[0]}
~                                                        

ros_humble_test_r35.4.1-onnx.txt

yuataka commented 9 months ago

I built only Pytorch. It succeeded. However, build with pytorch and ros2:humble failed.

./build.sh --name=pytorch_test  pytorch
mzahana commented 9 months ago

@dusty-nv

I would like to use dustynv/ros:humble-pytorch-l4t-r35.4.1 image, but I could not find it in the list here .

I need to use it on Orin NX with the latest Jetpack 5.1.2

Can it be built ?

Thanks

mzahana commented 9 months ago

When I executed the command

./build.sh --name=ros_humble_pytorch ros:humble-desktop pytorch

Part of the output is

-- Package pytorch:1.10 isn't compatible with L4T r35.4.1 (requires L4T ==32.*)
-- Package pytorch:1.10 was disabled by its config
-- Package pytorch:1.9 isn't compatible with L4T r35.4.1 (requires L4T ==32.*)
-- Package pytorch:1.9 was disabled by its config
-- Building containers  ['build-essential', 'python', 'cmake', 'numpy', 'opencv', 'ros:humble-desktop', 'onnx', 'pytorch']
-- Building container ros_humble_pytorch:r35.4.1-build-essential

Does this mean there is no pytorch version available for r35.4.1 ?

dusty-nv commented 9 months ago
-- Building containers  ['build-essential', 'python', 'cmake', 'numpy', 'opencv', 'ros:humble-desktop', 'onnx', 'pytorch']

@mzahana those other messages are normal, they are about the JetPack 4 versions of the PyTorch packages. There are others defined for JetPack 5 that it found. Did it start building the container after this?

mzahana commented 9 months ago

@dusty-nv I simply want an image with ros humble and pytorch for L4T 35.4.1.

How can I pull/build it ?

Thanks.

dusty-nv commented 9 months ago

The command you already ran should start building it for you. Or you can pull dustynv/ros:humble-pytorch-l4t-r35.3.1 instead. Or I think you could try this to simply add PyTorch on top of existing humble-desktop container:

./build.sh --base=dustynv/ros:humble-desktop-l4t-r35.4.1 --name=ros_humble_pytorch pytorch
mzahana commented 9 months ago

@dusty-nv I tried to build as you suggested

./build.sh --base=dustynv/ros:humble-desktop-l4t-r35.4.1 --name=ros_humble_pytorch pytorch

and I got the following error. It is related to building onnx.

#5 371.3   /tmp/pip-req-build-etjsc3ha/onnx/cpp2py_export.cc: In function ‘void onnx::pybind11_init_onnx_cpp2py_export(pybind11::module&)’:
#5 371.3   /tmp/pip-req-build-etjsc3ha/onnx/cpp2py_export.cc:165:15: error: ‘kw_only’ is not a member of ‘onnx::py’
#5 371.3     165 |           py::kw_only(),
#5 371.3         |               ^~~~~~~
#5 371.3   /tmp/pip-req-build-etjsc3ha/onnx/cpp2py_export.cc:223:15: error: ‘kw_only’ is not a member of ‘onnx::py’
#5 371.3     223 |           py::kw_only(),
#5 371.3         |               ^~~~~~~
#5 371.4   /tmp/pip-req-build-etjsc3ha/onnx/cpp2py_export.cc:306:15: error: ‘kw_only’ is not a member of ‘onnx::py’
#5 371.4     306 |           py::kw_only(),
#5 371.4         |               ^~~~~~~
#5 378.6   make[2]: *** [CMakeFiles/onnx_cpp2py_export.dir/build.make:76: CMakeFiles/onnx_cpp2py_export.dir/onnx/cpp2py_export.cc.o] Error 1
#5 378.6   make[1]: *** [CMakeFiles/Makefile2:1035: CMakeFiles/onnx_cpp2py_export.dir/all] Error 2
#5 378.6   make: *** [Makefile:136: all] Error 2
#5 378.6   Traceback (most recent call last):
#5 378.6     File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
#5 378.6       main()
#5 378.6     File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
#5 378.6       json_out['return_val'] = hook(**hook_input['kwargs'])
#5 378.6     File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
#5 378.6       return _build_backend().build_wheel(wheel_directory, config_settings,
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 434, in build_wheel
#5 378.6       return self._build_with_temp_dir(
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 419, in _build_with_temp_dir
#5 378.6       self.run_setup()
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 341, in run_setup
#5 378.6       exec(code, locals())
#5 378.6     File "<string>", line 327, in <module>
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 103, in setup
#5 378.6       return distutils.core.setup(**attrs)
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
#5 378.6       return run_commands(dist)
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
#5 378.6       dist.run_commands()
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
#5 378.6       self.run_command(cmd)
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 989, in run_command
#5 378.6       super().run_command(command)
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
#5 378.6       cmd_obj.run()
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/normal/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 369, in run
#5 378.6       self.run_command("build")
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
#5 378.6       self.distribution.run_command(command)
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 989, in run_command
#5 378.6       super().run_command(command)
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
#5 378.6       cmd_obj.run()
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 131, in run
#5 378.6       self.run_command(cmd_name)
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
#5 378.6       self.distribution.run_command(command)
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 989, in run_command
#5 378.6       super().run_command(command)
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
#5 378.6       cmd_obj.run()
#5 378.6     File "<string>", line 264, in run
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
#5 378.6       self.distribution.run_command(command)
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 989, in run_command
#5 378.6       super().run_command(command)
#5 378.6     File "/tmp/pip-build-env-vhem5ki3/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
#5 378.6       cmd_obj.run()
#5 378.6     File "<string>", line 243, in run
#5 378.6     File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
#5 378.6       raise CalledProcessError(retcode, cmd)
#5 378.6   subprocess.CalledProcessError: Command '['/usr/local/bin/cmake', '--build', '.', '--', '-j', '8']' returned non-zero exit status 2.
#5 378.7   error: subprocess-exited-with-error
#5 378.7   
#5 378.7   × Building wheel for onnx (pyproject.toml) did not run successfully.
#5 378.7   │ exit code: 1
#5 378.7   ╰─> See above for output.
#5 378.7   
#5 378.7   note: This error originates from a subprocess, and is likely not a problem with pip.
#5 378.7   full command: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpt73fjq7t
#5 378.7   cwd: /tmp/pip-req-build-etjsc3ha
#5 378.7   Building wheel for onnx (pyproject.toml): finished with status 'error'
#5 378.7   ERROR: Failed building wheel for onnx
#5 378.7 Failed to build onnx
#5 378.7 ERROR: Could not build wheels for onnx, which is required to install pyproject.toml-based projects
#5 ERROR: process "/bin/bash -c pip3 install --no-cache-dir --verbose git+https://github.com/onnx/onnx@${ONNX_VERSION}" did not complete successfully: exit code: 1
------
 > [2/3] RUN pip3 install --no-cache-dir --verbose git+https://github.com/onnx/onnx@main:
378.7   │ exit code: 1
378.7   ╰─> See above for output.
378.7   
378.7   note: This error originates from a subprocess, and is likely not a problem with pip.
378.7   full command: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpt73fjq7t
378.7   cwd: /tmp/pip-req-build-etjsc3ha
378.7   Building wheel for onnx (pyproject.toml): finished with status 'error'
378.7   ERROR: Failed building wheel for onnx
378.7 Failed to build onnx
378.7 ERROR: Could not build wheels for onnx, which is required to install pyproject.toml-based projects
------
Dockerfile:14
--------------------
  12 |     ARG ONNX_VERSION
  13 |     
  14 | >>> RUN pip3 install --no-cache-dir --verbose git+https://github.com/onnx/onnx@${ONNX_VERSION}
  15 |     #RUN pip3 install --no-cache-dir --verbose onnx
  16 |      
--------------------
ERROR: failed to solve: process "/bin/bash -c pip3 install --no-cache-dir --verbose git+https://github.com/onnx/onnx@${ONNX_VERSION}" did not complete successfully: exit code: 1
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/hunter/jetson-containers/jetson_containers/build.py", line 96, in <module>
    build_container(args.name, args.packages, args.base, args.build_flags, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api)
  File "/home/hunter/jetson-containers/jetson_containers/container.py", line 141, in build_container
    status = subprocess.run(cmd.replace(_NEWLINE_, ' '), executable='/bin/bash', shell=True, check=True)  
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'docker build --network=host --tag ros_humble_pytorch:r35.4.1-onnx --file /home/hunter/jetson-containers/packages/onnx/Dockerfile --build-arg BASE_IMAGE=ros_humble_pytorch:r35.4.1-cmake --build-arg ONNX_VERSION="main" /home/hunter/jetson-containers/packages/onnx 2>&1 | tee /home/hunter/jetson-containers/logs/20231113_203441/build/ros_humble_pytorch_r35.4.1-onnx.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1.
yuataka commented 9 months ago

After I downgrade l4t in Orin nano from 35.4.1 to 35.3.1, this build succeeded.

./build.sh --name=ros_humble_test ros:humble-desktop pytorch

Thank you, everyone.

dusty-nv commented 9 months ago

Interesting. Curiously, I was unable to resolve the error: ‘kw_only’ is not a member of ‘onnx::py’ (I think it may be pybind11-related) except by changing the build order to:

./build.sh --name=dustynv/ros:humble-desktop-pytorch pytorch ros:humble-desktop

Then it built and tested fine on 35.4.1. I pushed the image to dustynv/ros:humble-desktop-pytorch-l4t-r35.4.1

yuataka commented 9 months ago

After I downgrade l4t in Orin nano from 35.4.1 to 35.3.1, this build succeeded.

./build.sh --name=ros_humble_test ros:humble-desktop pytorch

Thank you, everyone.

Sorry, this post script is wrong. the sccueeded script is this.

./build.sh --name=ros_humble_test pytorch ros:humble-desktop

So, maybe the build order causes this problem. I will retry build on 35.4.1 by this order.

yuataka commented 9 months ago

The build on 35.4.1 by this order succeeded. The environment may be same with first build. So, the order of script will affect the build result.

./build.sh --name=ros_humble_test pytorch ros:humble-desktop
omerts commented 7 months ago

For anyone running into this (like me), that simply needs a ROS humble with pytorch for r35.4.1, just use dustynv/ros:humble-desktop-pytorch-l4t-r35.4.1:

docker/run.sh -c dustynv/ros:humble-desktop-pytorch-l4t-r35.4.1
tonynajjar commented 7 months ago

For anyone running into this (like me), that simply needs a ROS humble with pytorch for r35.4.1, just use dustynv/ros:humble-desktop-pytorch-l4t-r35.4.1:

docker/run.sh -c dustynv/ros:humble-desktop-pytorch-l4t-r35.4.1

Thanks! I also stumbled upon this issue. @dusty-nv just out of curiosity, why do you build an image or ros+pytorch specifically? I don't see many other packages combination being built

dusty-nv commented 7 months ago

@tonynajjar it's because that combination seemed to come up over the years, where people wanted ROS + PyTorch (presumably to make ML/DNN nodes for ROS). Historically I had also included jetson-inference in it