Open peter5232 opened 7 months ago
You can try as follow
trtexec --onnx=superpoint_lightglue.onnx --loadEngine=superpoint_lightglue.engine --verbose 2>&1 |tee log
cat log |grep "Using random values for input"
cat log |grep "Using random values for output"
can show all inputs and outputs.
I try this command and I get output as follow.
[04/21/2024-23:12:14] [I] Using random values for input desc0
[04/21/2024-23:12:14] [I] Using random values for input desc1
Actually have two inputs. But onnx file have four input tensors.
torch.onnx.export(
lightglue,
(kpts0, kpts1, desc0, desc1),
lightglue_path,
input_names=["kpts0", "kpts1", "desc0", "desc1"],
output_names=["matches0", "mscores0"],
opset_version=17,
dynamic_axes={
"kpts0": {1: "num_keypoints0"},
"kpts1": {1: "num_keypoints1"},
"desc0": {1: "num_keypoints0"},
"desc1": {1: "num_keypoints1"},
"matches0": {0: "num_matches0"},
"mscores0": {0: "num_matches0"},
},
)
I try this command and I get output as follow.
[04/21/2024-23:12:14] [I] Using random values for input desc0
[04/21/2024-23:12:14] [I] Using random values for input desc1
Actually have two inputs. But onnx file have four input tensors.
torch.onnx.export(
lightglue,
(kpts0, kpts1, desc0, desc1),
lightglue_path,
input_names=["kpts0", "kpts1", "desc0", "desc1"],
output_names=["matches0", "mscores0"],
opset_version=17,
dynamic_axes={
"kpts0": {1: "num_keypoints0"},
"kpts1": {1: "num_keypoints1"},
"desc0": {1: "num_keypoints0"},
"desc1": {1: "num_keypoints1"},
"matches0": {0: "num_matches0"},
"mscores0": {0: "num_matches0"},
},
)
@peter5232
can you run follow cmd
trtexec --onnx=superpoint_lightglue.onnx --saveEngine=superpoint_lightglue.engine --verbose 2>&1 | tee build.log
and then upload the build.log file
what does polygraphy inspect model superpoint_lightglue.onnx
output? Or how many inputs you can see in netron?
Check inputs/outputs by netron is not always right. Sometimes netron can not see the hidden inputs/outputs.
@zerollzeng I come across one case, the onnx(39MB) open by netron show nothing, but use trtexec can build pass.
[05/06/2024-11:23:47] [I] Engine deserialized in 0.113882 sec.
[05/06/2024-11:23:47] [V] [TRT] Total per-runner device persistent memory is 0
[05/06/2024-11:23:47] [V] [TRT] Total per-runner host persistent memory is 0
[05/06/2024-11:23:47] [V] [TRT] Allocated activation device memory of size 0
[05/06/2024-11:23:47] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 39 (MiB)
[05/06/2024-11:23:47] [I] Setting persistentCacheLimit to 0 bytes.
[05/06/2024-11:23:47] [V] Using enqueueV3.
[05/06/2024-11:23:47] [I] Using random values for output 82
[05/06/2024-11:23:47] [I] Created output binding for 82 with dimensions 1x256x200x200
[05/06/2024-11:23:47] [I] Starting inference
[05/06/2024-11:23:50] [I] The e2e network timing is not reported since it is inaccurate due to the extra synchronizations when the profiler is enabled.
[05/06/2024-11:23:50] [I] To show e2e network timing report, add --separateProfileRun to profile layer timing in a separate run or remove --dumpProfile to disable the profiler.
[05/06/2024-11:23:50] [I]
[05/06/2024-11:23:50] [I] === Profile (1032 iterations ) ===
[05/06/2024-11:23:50] [I] Layer Time (ms) Avg. Time (ms) Median Time (ms) Time %
[05/06/2024-11:23:50] [I] Reformatting CopyNode for Output Tensor 0 to 82 384.10 0.3722 0.3758 100.0
[05/06/2024-11:23:50] [I] Total 384.10 0.3722 0.3758 100.0
[05/06/2024-11:23:50] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8510] # trtexec --onnx=positional_encoding_poly.onnx --verbose --dumpProfile
Description
I have four input tensors [ "kpts0", "kpts1", "desc0", "desc1" ].
I convert engine with the following command. onnx file
trtexec --onnx=superpoint_lightglue.onnx --saveEngine=superpoint_lightglue.engine
But when I use the Python API to obtain the IO Tensor, I only get desc0, desc1, matches0, mscores0.
I get output as follow.
Environment
TensorRT Version: v8.5.3 and v8.6.1
NVIDIA GPU: 4090
NVIDIA Driver Version: 535.129.03
CUDA Version: 11.8
CUDNN Version: 8.9.6
Operating System:
Python Version (if applicable): 3.11
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 2.1.0
Baremetal or Container (if so, version):