NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.56k stars 2.1k forks source link

Convert onnx model to trt model failure, Reason: Tactic Device request: 1288794MB Available: 14928MB. Device memory is insufficient to use tactic. #3824

Open jiangruoqiao opened 4 months ago

jiangruoqiao commented 4 months ago

Description

Environment

TensorRT Version: 8.6.1

NVIDIA GPU: T4

NVIDIA Driver Version: 525

CUDA Version: 11.4 (nvidis-docker2)

CUDNN Version: cuda_11.4.r11.4/compiler.30521435_0

Operating System:

Python Version (if applicable): Python 3.8

Relevant Files

Model link: https://drive.google.com/file/d/1Ey1FGXz6BCmHDpEvdRra9h_qFrxAh1GN/view?usp=share_link

Error Log

[04/25/2024-19:10:22] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 1288549MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 1288794MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 1288549MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 1288794MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 1288549MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 644274MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 644397MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 644274MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 644274MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 644397MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 644274MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 644274MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 644397MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 644274MB Available: 14928MB. Device memory is insufficient to use tactic. [04/25/2024-19:10:23] [TRT] [E] 10: Could not find any implementation for node /Reshape_1 + /Transpose. [04/25/2024-19:10:23] [TRT] [E] 10: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node /Reshape_1 + /Transpose.) build_serialized_network SUCC Traceback (most recent call last): File "onnx2trt.py", line 178, in get_engine(test_shape, "encoder.onnx", "encoder_2.trt") File "onnx2trt.py", line 83, in get_engine return build_engine() File "onnx2trt.py", line 71, in build_engine res = runtime.deserialize_cuda_engine(plan) TypeError: deserialize_cuda_engine(): incompatible function arguments. The following argument types are supported:

  1. (self: tensorrt.tensorrt.Runtime, serialized_engine: buffer) -> tensorrt.tensorrt.ICudaEngine

Invoked with: <tensorrt.tensorrt.Runtime object at 0x7fe1eedc82b0>, None

Note

        profile.set_shape(network.get_input(0).name, (1, 3, 16, 16), (1, 3, 32, 32), (1, 3, 32, 32))
lix19937 commented 4 months ago

[04/25/2024-19:10:23] [TRT] [W] Tactic Device request: 644274MB Available: 14928MB. Device memory is insufficient to use tactic.

It is a warning, you can check with the watch nvidia-smi to see if the system memory is really full occupied.
And to reduce memory, try to modify the cmd with –-memPoolSize=<pool_spec> flag.

jiangruoqiao commented 4 months ago

Although it is a waning, convert is failure. To verify if it is failure by OOM, I try to reduce the size of the network input(10241024 to 3232),it is successful to convert

zerollzeng commented 4 months ago

Looks like just an OOM issue.

zerollzeng commented 4 months ago

Your network is too big for the gpu.

jiangruoqiao commented 4 months ago

But the same network work well in onnxruntime-gpu

lix19937 commented 4 months ago

But the same network work well in onnxruntime-gpu

ort different with trt.

You can try trtexec close all tactic backend.