NVIDIA-AI-IOT / nanosam

A distilled Segment Anything (SAM) model capable of running real-time with NVIDIA TensorRT
Apache License 2.0
616 stars 52 forks source link

Container build doesn't seem to work #24

Open burningion opened 5 months ago

burningion commented 5 months ago

Hey there,

Thanks for the work on nanosam! I'm working on a project, and trying to run nanosam from the included Docker image.

I've added the line:

    --runtime nvidia \

To the docker/23-01/run.sh so it'll run on Jetpack 6 Developer Preview.

When I add the resnet18_image_encoder.onnx and the mobile_sam_mask_decoder.onnx for training, I get the same error:

$ trtexec --onnx=data/mobile_sam_mask_decoder.onnx     --saveEngine=data/mobile_sam_mask_decoder.engine     --minShapes=point_coords:1x1x2,point_labels:1x1     --optShapes=point_coords:1x1x2,point_labels:1x1     --maxShapes=point_coords:1x10x2,point_labels:1x10

RUNNING TensorRT.trtexec [TensorRT v8502] # trtexec --onnx=data/mobile_sam_mask_decoder.onnx --saveEngine=data/mobile_sam_mask_decoder.engine --minShapes=point_coords:1x1x2,point_labels:1x1 --optShapes=point_coords:1x1x2,point_labels:1x1 --maxShapes=point_coords:1x10x2,point_labels:1x10
[03/29/2024-17:22:18] [I] === Model Options ===
[03/29/2024-17:22:18] [I] Format: ONNX
[03/29/2024-17:22:18] [I] Model: data/mobile_sam_mask_decoder.onnx
[03/29/2024-17:22:18] [I] Output:
[03/29/2024-17:22:18] [I] === Build Options ===
[03/29/2024-17:22:18] [I] Max batch: explicit batch
[03/29/2024-17:22:18] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[03/29/2024-17:22:18] [I] minTiming: 1
[03/29/2024-17:22:18] [I] avgTiming: 8
[03/29/2024-17:22:18] [I] Precision: FP32
[03/29/2024-17:22:18] [I] LayerPrecisions:
[03/29/2024-17:22:18] [I] Calibration:
[03/29/2024-17:22:18] [I] Refit: Disabled
[03/29/2024-17:22:18] [I] Sparsity: Disabled
[03/29/2024-17:22:18] [I] Safe mode: Disabled
[03/29/2024-17:22:18] [I] DirectIO mode: Disabled
[03/29/2024-17:22:18] [I] Restricted mode: Disabled
[03/29/2024-17:22:18] [I] Build only: Disabled
[03/29/2024-17:22:18] [I] Save engine: data/mobile_sam_mask_decoder.engine
[03/29/2024-17:22:18] [I] Load engine:
[03/29/2024-17:22:18] [I] Profiling verbosity: 0
[03/29/2024-17:22:18] [I] Tactic sources: Using default tactic sources
[03/29/2024-17:22:18] [I] timingCacheMode: local
[03/29/2024-17:22:18] [I] timingCacheFile:
[03/29/2024-17:22:18] [I] Heuristic: Disabled
[03/29/2024-17:22:18] [I] Preview Features: Use default preview flags.
[03/29/2024-17:22:18] [I] Input(s)s format: fp32:CHW
[03/29/2024-17:22:18] [I] Output(s)s format: fp32:CHW
[03/29/2024-17:22:18] [I] Input build shape: point_coords=1x1x2+1x1x2+1x10x2
[03/29/2024-17:22:18] [I] Input build shape: point_labels=1x1+1x1+1x10
[03/29/2024-17:22:18] [I] Input calibration shapes: model
[03/29/2024-17:22:18] [I] === System Options ===
[03/29/2024-17:22:18] [I] Device: 0
[03/29/2024-17:22:18] [I] DLACore:
[03/29/2024-17:22:18] [I] Plugins:
[03/29/2024-17:22:18] [I] === Inference Options ===
[03/29/2024-17:22:18] [I] Batch: Explicit
[03/29/2024-17:22:18] [I] Input inference shape: point_labels=1x1
[03/29/2024-17:22:18] [I] Input inference shape: point_coords=1x1x2
[03/29/2024-17:22:18] [I] Iterations: 10
[03/29/2024-17:22:18] [I] Duration: 3s (+ 200ms warm up)
[03/29/2024-17:22:18] [I] Sleep time: 0ms
[03/29/2024-17:22:18] [I] Idle time: 0ms
[03/29/2024-17:22:18] [I] Streams: 1
[03/29/2024-17:22:18] [I] ExposeDMA: Disabled
[03/29/2024-17:22:18] [I] Data transfers: Enabled
[03/29/2024-17:22:18] [I] Spin-wait: Disabled
[03/29/2024-17:22:18] [I] Multithreading: Disabled
[03/29/2024-17:22:18] [I] CUDA Graph: Disabled
[03/29/2024-17:22:18] [I] Separate profiling: Disabled
[03/29/2024-17:22:18] [I] Time Deserialize: Disabled
[03/29/2024-17:22:18] [I] Time Refit: Disabled
[03/29/2024-17:22:18] [I] NVTX verbosity: 0
[03/29/2024-17:22:18] [I] Persistent Cache Ratio: 0
[03/29/2024-17:22:18] [I] Inputs:
[03/29/2024-17:22:18] [I] === Reporting Options ===
[03/29/2024-17:22:18] [I] Verbose: Disabled
[03/29/2024-17:22:18] [I] Averages: 10 inferences
[03/29/2024-17:22:18] [I] Percentiles: 90,95,99
[03/29/2024-17:22:18] [I] Dump refittable layers:Disabled
[03/29/2024-17:22:18] [I] Dump output: Disabled
[03/29/2024-17:22:18] [I] Profile: Disabled
[03/29/2024-17:22:18] [I] Export timing to JSON file:
[03/29/2024-17:22:18] [I] Export output to JSON file:
[03/29/2024-17:22:18] [I] Export profile to JSON file:
[03/29/2024-17:22:18] [I]
Cuda failure: CUDA driver version is insufficient for CUDA runtime version
Aborted (core dumped)

It appears both of these have the same CUDA driver version is insufficient.

When I try to do a:

$ pip install --ugprade tensorrt

In the container, it just fails:

$  pip install --upgrade tensorrt
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: tensorrt in /usr/local/lib/python3.8/dist-packages (8.5.2.2)
Collecting tensorrt
  Downloading tensorrt-8.6.1.post1.tar.gz (18 kB)
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: tensorrt
  Building wheel for tensorrt (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [64 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib
      creating build/lib/tensorrt
      copying tensorrt/__init__.py -> build/lib/tensorrt
      running egg_info
      writing tensorrt.egg-info/PKG-INFO
      writing dependency_links to tensorrt.egg-info/dependency_links.txt
      writing requirements to tensorrt.egg-info/requires.txt
      writing top-level names to tensorrt.egg-info/top_level.txt
      reading manifest file 'tensorrt.egg-info/SOURCES.txt'
      adding license file 'LICENSE.txt'
      writing manifest file 'tensorrt.egg-info/SOURCES.txt'
      /usr/local/lib/python3.8/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      installing to build/bdist.linux-aarch64/wheel
      running install
      Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com, https://pypi.nvidia.com
      ERROR: Could not find a version that satisfies the requirement tensorrt_libs==8.6.1 (from versions: 9.0.0.post11.dev1, 9.0.0.post12.dev1, 9.0.1.post11.dev4, 9.0.1.post12.dev4, 9.1.0.post11.dev4, 9.1.0.post12.dev4, 9.2.0.post11.dev5, 9.2.0.post12.dev5, 9.3.0.post11.dev1, 9.3.0.post12.dev1)
      ERROR: No matching distribution found for tensorrt_libs==8.6.1
      Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com, https://pypi.nvidia.com
      ERROR: Could not find a version that satisfies the requirement tensorrt_libs==8.6.1 (from versions: 9.0.0.post11.dev1, 9.0.0.post12.dev1, 9.0.1.post11.dev4, 9.0.1.post12.dev4, 9.1.0.post11.dev4, 9.1.0.post12.dev4, 9.2.0.post11.dev5, 9.2.0.post12.dev5, 9.3.0.post11.dev1, 9.3.0.post12.dev1)
      ERROR: No matching distribution found for tensorrt_libs==8.6.1
      Traceback (most recent call last):
        File "/tmp/pip-install-yb_w1k5i/tensorrt_7e4f3d0260464b37877fc72585ffd270/setup.py", line 40, in run_pip_command
          return call_func([sys.executable, "-m", "pip"] + args, env=env)
        File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['/usr/bin/python', '-m', 'pip', 'install', '--extra-index-url', 'https://pypi.nvidia.com', 'tensorrt_libs==8.6.1', 'tensorrt_bindings==8.6.1']' returned non-zero exit status 1.

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-yb_w1k5i/tensorrt_7e4f3d0260464b37877fc72585ffd270/setup.py", line 110, in <module>
          setup(
        File "/usr/local/lib/python3.8/dist-packages/setuptools/__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
        File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
          self.run_command(cmd)
        File "/usr/local/lib/python3.8/dist-packages/setuptools/dist.py", line 1217, in run_command
          super().run_command(command)
        File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/local/lib/python3.8/dist-packages/wheel/bdist_wheel.py", line 360, in run
          self.run_command("install")
        File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/local/lib/python3.8/dist-packages/setuptools/dist.py", line 1217, in run_command
          super().run_command(command)
        File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/tmp/pip-install-yb_w1k5i/tensorrt_7e4f3d0260464b37877fc72585ffd270/setup.py", line 62, in run
          run_pip_command(
        File "/tmp/pip-install-yb_w1k5i/tensorrt_7e4f3d0260464b37877fc72585ffd270/setup.py", line 56, in run_pip_command
          return call_func([pip_path] + args, env=env)
        File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['/usr/local/bin/pip', 'install', '--extra-index-url', 'https://pypi.nvidia.com', 'tensorrt_libs==8.6.1', 'tensorrt_bindings==8.6.1']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tensorrt
  Running setup.py clean for tensorrt
Failed to build tensorrt
ERROR: Could not build wheels for tensorrt, which is required to install pyproject.toml-based projects

Any tips for further debugging? Should we rebuild containers for Jetpack 6?

burningion commented 5 months ago

Following up here, I tried using the container from nanoowl, and although trtexec wasn't on the PATH, it was in /usr/src/tensorrt/bin/trtexec. Using that I was able to successfully run optimization.

Commands became:

$ /usr/src/tensorrt/bin/trtexec --onnx=data/mobile_sam_mask_decoder.onnx     --saveEngine=data/mobile_sam_mask_decoder.engine     --minShapes=point_coords:1x1x2,point_labels:1x1     --optShapes=point_coords:1x1x2,point_labels:1x1     --maxShapes=point_coords:1x10x2,point_labels:1x10

...

$ /usr/src/tensorrt/bin/trtexec --onnx=data/resnet18_image_encoder.onnx --saveEngine=data/resnet18_image_encoder.engine --fp16