dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.13k stars 440 forks source link

ROS2 Humble Nano, file version.json not found #308

Closed PaddyCube closed 10 months ago

PaddyCube commented 11 months ago

Hello,

I try to run ROS2 Humble on my Jetson Nano. Therefore, I pulled humble-desktop-l4t-r32.7.1 container. When I run

./run.sh $(./autotag humble-desktop-l4t-r32.7.1)

I get the following error

  File "/usr/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/home/ros/docker/jetson-containers/jetson_containers/__init__.py", line 7, in <module>
    from .logging import *
  File "/home/ros/docker/jetson-containers/jetson_containers/logging.py", line 5, in <module>
    from .packages import _PACKAGE_ROOT
  File "/home/ros/docker/jetson-containers/jetson_containers/packages.py", line 11, in <module>
    from .l4t_version import L4T_VERSION
  File "/home/ros/docker/jetson-containers/jetson_containers/l4t_version.py", line 244, in <module>
    CUDA_VERSION = get_cuda_version()
  File "/home/ros/docker/jetson-containers/jetson_containers/l4t_version.py", line 165, in get_cuda_version
    raise IOError(f"L4T_VERSION file doesn't exist:  {version_file}")
OSError: L4T_VERSION file doesn't exist:  /usr/local/cuda/version.json
-- Error:  return code 1

So it doesn't find file version.json under /usr/local/cuda, which is true as there only exists a version.txt file instead.

So how must version.json looks like?

dusty-nv commented 11 months ago

Thanks for reporting this @PaddyCube, will look into it 👍

What version of JetPack-L4T are you running? You can check this with cat /etc/nv_tegra_release

In the meantime, you should be able to start this container manually with a command like below:

sudo docker run --runtime nvidia -it --rm --network=host dustynv/ros:humble-desktop-l4t-r32.7.1

https://github.com/dusty-nv/jetson-containers/tree/master/packages/ros#user-content-run

PaddyCube commented 10 months ago

I tried with Jetpack 4.3. I also tried to run the container with the command you provided without success. I can't reproduce right now as I recently re-flashed with 4.6.4. I'll let you know

PaddyCube commented 10 months ago

With latest JetPack, it runs without any error if I run the container manually. However with the run.sh script, it doesn't.

ros@ubuntu-jetson:~/docker/jetson-containers$ ./run.sh $(./autotag humble-desktop-l4t-r32.7.1)
Namespace(disable=[''], output='/tmp/autotag', packages=['humble-desktop-l4t-r32.7.1'], prefer=['local', 'registry', 'build'], quiet=False, user='dustynv', verbose=False)
-- L4T_VERSION=32.7.4  JETPACK_VERSION=4.6.4  CUDA_VERSION=10.2.300
-- Finding compatible container image for ['humble-desktop-l4t-r32.7.1']

Couldn't find a compatible container for humble-desktop-l4t-r32.7.1, would you like to build it? [y/N]
PaddyCube commented 10 months ago

My fault, the tag must be ros, sot his works as expected:

./run.sh $(./autotag ros)

astringfield commented 9 months ago

@dusty-nv is this issue resolved? I'm encountering a similar error when trying to build the following container:

$ ./build.sh pytorch torchvision zed

This fails when testing the torchvision:r35.3.1-cuda container:

cat: '/usr/local/cuda/version*': No such file or directory

Extended output:

-- Testing container torchvision:r35.3.1-cuda (cuda:11.4/test.sh)

docker run -t --rm --runtime=nvidia --network=host \
--volume /home/perception/repositories/jetson-containers/packages/cuda/cuda:/test \
--volume /home/perception/repositories/jetson-containers/data:/data \
--workdir /test \
torchvision:r35.3.1-cuda \
/bin/bash -c '/bin/bash test.sh' \
2>&1 | tee /home/perception/repositories/jetson-containers/logs/20231211_124109/test/torchvision_r35.3.1-cuda_test.sh.txt; exit ${PIPESTATUS[0]}

cat: '/usr/local/cuda/version*': No such file or directory
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/perception/repositories/jetson-containers/jetson_containers/build.py", line 102, in <module>
    build_container(args.name, args.packages, args.base, args.build_flags, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api)
  File "/home/perception/repositories/jetson-containers/jetson_containers/container.py", line 148, in build_container
    test_container(container_name, pkg, simulate)
  File "/home/perception/repositories/jetson-containers/jetson_containers/container.py", line 320, in test_container
    status = subprocess.run(cmd.replace(_NEWLINE_, ' '), executable='/bin/bash', shell=True, check=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/perception/anaconda3/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'docker run -t --rm --runtime=nvidia --network=host --volume /home/perception/repositories/jetson-containers/packages/cuda/cuda:/test --volume /home/perception/repositories/jetson-containers/data:/data --workdir /test torchvision:r35.3.1-cuda /bin/bash -c '/bin/bash test.sh' 2>&1 | tee /home/perception/repositories/jetson-containers/logs/20231211_124109/test/torchvision_r35.3.1-cuda_test.sh.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1.

Changing the build order

I've tried changing the package build order to ./build.sh pytorch torchvision zed but it produces the same result.

Building torch and zed separately

I can successfully build the torch/torchvision and zed containers separately:

$ ./build.sh zed  # completes successfully
$ ./build.sh pytorch torchvision  # completes successfully

Somewhere in the process of combining the Torch and ZED packages the /usr/local/cuda/version.json file goes missing.

I appreciate any help!

dusty-nv commented 9 months ago

Somewhere in the process of combining the Torch and ZED packages the /usr/local/cuda/version.json file goes missing.

Hi @astringfield, this is a different error, but ZED dockerfile was doing apt-get autoremove (which I presumably followed the Stereolabs docs for), and it was removing CUDA stuff. I had only recently added more tests which triggered the error. Just fixed this in commit https://github.com/dusty-nv/jetson-containers/commit/93739890ec4f575c113e0c8cfad76a6e4e1a51d3

astringfield commented 9 months ago

@dusty-nv thank you for taking a look at this so quickly - I pulled the commit and was able to build the combined Torch/ZED container successfully :+1: