NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.87k stars 2.14k forks source link

num_io_tensors get error of TensorRT 8.5 when running on GPU 4090 #3803

Open peter5232 opened 7 months ago

peter5232 commented 7 months ago

Description

I have four input tensors [ "kpts0", "kpts1", "desc0", "desc1" ].

torch.onnx.export(
            lightglue,
            (kpts0, kpts1, desc0, desc1),
            lightglue_path,
            input_names=["kpts0", "kpts1", "desc0", "desc1"],
            output_names=["matches0", "mscores0"],
            opset_version=17,
            dynamic_axes={
                "kpts0": {1: "num_keypoints0"},
                "kpts1": {1: "num_keypoints1"},
                "desc0": {1: "num_keypoints0"},
                "desc1": {1: "num_keypoints1"},
                "matches0": {0: "num_matches0"},
                "mscores0": {0: "num_matches0"},
            },
        )

I convert engine with the following command. onnx file

trtexec --onnx=superpoint_lightglue.onnx --saveEngine=superpoint_lightglue.engine

But when I use the Python API to obtain the IO Tensor, I only get desc0, desc1, matches0, mscores0.

import tensorrt as trt

logger = trt.Logger(trt.Logger.WARNING)

with open("superpoint_lightglue.engine", "rb") as f:
        engine = trt.Runtime(logger).deserialize_cuda_engine(f.read())
        tensor_names = [engine.get_tensor_name(i) for i in range(engine.num_io_tensors)]
        print(tensor_names)

I get output as follow.

['desc0', 'desc1', 'matches0', 'mscores0']

Environment

TensorRT Version: v8.5.3 and v8.6.1

NVIDIA GPU: 4090

NVIDIA Driver Version: 535.129.03

CUDA Version: 11.8

CUDNN Version: 8.9.6

Operating System:

Python Version (if applicable): 3.11

Tensorflow Version (if applicable):

PyTorch Version (if applicable): 2.1.0

Baremetal or Container (if so, version):

lix19937 commented 7 months ago

You can try as follow

trtexec --onnx=superpoint_lightglue.onnx  --loadEngine=superpoint_lightglue.engine   --verbose  2>&1 |tee log   

cat log |grep "Using random values for input"   
cat log |grep "Using random values for output"   

can show all inputs and outputs.

peter5232 commented 7 months ago

I try this command and I get output as follow.

[04/21/2024-23:12:14] [I] Using random values for input desc0
[04/21/2024-23:12:14] [I] Using random values for input desc1

Actually have two inputs. But onnx file have four input tensors.

torch.onnx.export(
            lightglue,
            (kpts0, kpts1, desc0, desc1),
            lightglue_path,
            input_names=["kpts0", "kpts1", "desc0", "desc1"],
            output_names=["matches0", "mscores0"],
            opset_version=17,
            dynamic_axes={
                "kpts0": {1: "num_keypoints0"},
                "kpts1": {1: "num_keypoints1"},
                "desc0": {1: "num_keypoints0"},
                "desc1": {1: "num_keypoints1"},
                "matches0": {0: "num_matches0"},
                "mscores0": {0: "num_matches0"},
            },
        )
peter5232 commented 7 months ago

I try this command and I get output as follow.

[04/21/2024-23:12:14] [I] Using random values for input desc0
[04/21/2024-23:12:14] [I] Using random values for input desc1

Actually have two inputs. But onnx file have four input tensors.

torch.onnx.export(
            lightglue,
            (kpts0, kpts1, desc0, desc1),
            lightglue_path,
            input_names=["kpts0", "kpts1", "desc0", "desc1"],
            output_names=["matches0", "mscores0"],
            opset_version=17,
            dynamic_axes={
                "kpts0": {1: "num_keypoints0"},
                "kpts1": {1: "num_keypoints1"},
                "desc0": {1: "num_keypoints0"},
                "desc1": {1: "num_keypoints1"},
                "matches0": {0: "num_matches0"},
                "mscores0": {0: "num_matches0"},
            },
        )
lix19937 commented 7 months ago

@peter5232
can you run follow cmd

trtexec --onnx=superpoint_lightglue.onnx  --saveEngine=superpoint_lightglue.engine  --verbose 2>&1 | tee  build.log

and then upload the build.log file

zerollzeng commented 7 months ago

what does polygraphy inspect model superpoint_lightglue.onnx output? Or how many inputs you can see in netron?

lix19937 commented 6 months ago

Check inputs/outputs by netron is not always right. Sometimes netron can not see the hidden inputs/outputs.

lix19937 commented 6 months ago

@zerollzeng I come across one case, the onnx(39MB) open by netron show nothing, but use trtexec can build pass.

[05/06/2024-11:23:47] [I] Engine deserialized in 0.113882 sec.
[05/06/2024-11:23:47] [V] [TRT] Total per-runner device persistent memory is 0
[05/06/2024-11:23:47] [V] [TRT] Total per-runner host persistent memory is 0
[05/06/2024-11:23:47] [V] [TRT] Allocated activation device memory of size 0
[05/06/2024-11:23:47] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 39 (MiB)
[05/06/2024-11:23:47] [I] Setting persistentCacheLimit to 0 bytes.
[05/06/2024-11:23:47] [V] Using enqueueV3.
[05/06/2024-11:23:47] [I] Using random values for output 82
[05/06/2024-11:23:47] [I] Created output binding for 82 with dimensions 1x256x200x200
[05/06/2024-11:23:47] [I] Starting inference
[05/06/2024-11:23:50] [I] The e2e network timing is not reported since it is inaccurate due to the extra synchronizations when the profiler is enabled.
[05/06/2024-11:23:50] [I] To show e2e network timing report, add --separateProfileRun to profile layer timing in a separate run or remove --dumpProfile to disable the profiler.
[05/06/2024-11:23:50] [I]
[05/06/2024-11:23:50] [I] === Profile (1032 iterations ) ===
[05/06/2024-11:23:50] [I]                                            Layer   Time (ms)   Avg. Time (ms)   Median Time (ms)   Time %
[05/06/2024-11:23:50] [I]  Reformatting CopyNode for Output Tensor 0 to 82      384.10           0.3722             0.3758    100.0
[05/06/2024-11:23:50] [I]                                            Total      384.10           0.3722             0.3758    100.0
[05/06/2024-11:23:50] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8510] # trtexec --onnx=positional_encoding_poly.onnx --verbose --dumpProfile