Closed ywfwyht closed 2 years ago
I got seg fault on TRT 8.2.3(docker image 22.03):
[10/09/2022-08:43:12] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: -7067026478815706014
[10/09/2022-08:43:12] [V] [TRT] =============== Computing costs for
[10/09/2022-08:43:12] [V] [TRT] *************** Autotuning format combination: Float(1327104,20736,144,1) -> Float(165888,20736,144,1) ***************
[10/09/2022-08:43:12] [V] [TRT] --------------- Timing Runner: {ForeignNode[Reshape_111 + Transpose_112...Reshape_835]} (Myelin)
Segmentation fault (core dumped)
I have no name!@4cff0ed21e85:/workspace$ trtexec --onnx=/zeroz/temp/2377/0903_p28_t3_seg_simp.onnx --verbose
22.08 with TRT 8.4.2:
[10/09/2022-08:31:54] [I] === Performance summary ===
[10/09/2022-08:31:54] [I] Throughput: 68.1918 qps
[10/09/2022-08:31:54] [I] Latency: min = 15.8783 ms, max = 16.3865 ms, mean = 16.1759 ms, median = 16.1797 ms, percentile(99%) = 16.3816 ms
[10/09/2022-08:31:54] [I] Enqueue Time: min = 14.3593 ms, max = 14.8667 ms, mean = 14.6316 ms, median = 14.6232 ms, percentile(99%) = 14.864 ms
[10/09/2022-08:31:54] [I] H2D Latency: min = 1.43982 ms, max = 1.51929 ms, mean = 1.47669 ms, median = 1.47974 ms, percentile(99%) = 1.50665 ms
[10/09/2022-08:31:54] [I] GPU Compute Time: min = 14.3636 ms, max = 14.8584 ms, mean = 14.6431 ms, median = 14.6494 ms, percentile(99%) = 14.8408 ms
[10/09/2022-08:31:54] [I] D2H Latency: min = 0.0529785 ms, max = 0.0578613 ms, mean = 0.0560911 ms, median = 0.0560303 ms, percentile(99%) = 0.0577393 ms
[10/09/2022-08:31:54] [I] Total Host Walltime: 3.03556 s
[10/09/2022-08:31:54] [I] Total GPU Compute Time: 3.03113 s
&&&& PASSED TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=0903_p28_t3_seg_simp.onnx --verbose
TRT 8.5(22.09) perf is closed to 22.08
@ywfwyht how you run the model with TRT 8.2.3? also seems there are gaps for 8.4 between your result and mine.
@ywfwyht how you run the model with TRT 8.2.3? also seems there are gaps for 8.4 between your result and mine.
If you use 8.2 you must turn on the option --best
I got seg fault on TRT 8.2.3(docker image 22.03):
[10/09/2022-08:43:12] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: -7067026478815706014 [10/09/2022-08:43:12] [V] [TRT] =============== Computing costs for [10/09/2022-08:43:12] [V] [TRT] *************** Autotuning format combination: Float(1327104,20736,144,1) -> Float(165888,20736,144,1) *************** [10/09/2022-08:43:12] [V] [TRT] --------------- Timing Runner: {ForeignNode[Reshape_111 + Transpose_112...Reshape_835]} (Myelin) Segmentation fault (core dumped) I have no name!@4cff0ed21e85:/workspace$ trtexec --onnx=/zeroz/temp/2377/0903_p28_t3_seg_simp.onnx --verbose
22.08 with TRT 8.4.2:
[10/09/2022-08:31:54] [I] === Performance summary === [10/09/2022-08:31:54] [I] Throughput: 68.1918 qps [10/09/2022-08:31:54] [I] Latency: min = 15.8783 ms, max = 16.3865 ms, mean = 16.1759 ms, median = 16.1797 ms, percentile(99%) = 16.3816 ms [10/09/2022-08:31:54] [I] Enqueue Time: min = 14.3593 ms, max = 14.8667 ms, mean = 14.6316 ms, median = 14.6232 ms, percentile(99%) = 14.864 ms [10/09/2022-08:31:54] [I] H2D Latency: min = 1.43982 ms, max = 1.51929 ms, mean = 1.47669 ms, median = 1.47974 ms, percentile(99%) = 1.50665 ms [10/09/2022-08:31:54] [I] GPU Compute Time: min = 14.3636 ms, max = 14.8584 ms, mean = 14.6431 ms, median = 14.6494 ms, percentile(99%) = 14.8408 ms [10/09/2022-08:31:54] [I] D2H Latency: min = 0.0529785 ms, max = 0.0578613 ms, mean = 0.0560911 ms, median = 0.0560303 ms, percentile(99%) = 0.0577393 ms [10/09/2022-08:31:54] [I] Total Host Walltime: 3.03556 s [10/09/2022-08:31:54] [I] Total GPU Compute Time: 3.03113 s &&&& PASSED TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=0903_p28_t3_seg_simp.onnx --verbose
TRT 8.5(22.09) perf is closed to 22.08
Which is the inference time?
My inference code is based on your sample https://github.com/NVIDIA/TensorRT/blob/main/samples/python/yolov3_onnx/onnx_to_tensorrt.py
@zerollzeng Can you tell me about your environment? I still get an error with trt8.4.
When I use polygraphy, it makes an error, but when I use trtexec, it doesn't make an error.
polygraphy run submodel_backbone.onnx \
--trt \
--onnxrt \
--pool-limit workspace:8G \
--save-engine=submodel_backbone.trt \
--atol 1e-3 --rtol 1e-3 \
--verbose \
--trt-outputs mark all \
--onnx-outputs mark all \
--fail-fast \
--val-range [0,1]
[10/10/2022-02:31:06] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +750, GPU +318, now: CPU 1494, GPU 2208 (MiB)
[10/10/2022-02:31:06] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +127, GPU +60, now: CPU 1621, GPU 2268 (MiB)
[10/10/2022-02:31:06] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[10/10/2022-02:31:06] [TRT] [W] Skipping tactic 0x0000000000000000 due to Myelin error: Formal output tensor "1250 _ (Unnamed Layer_ 4) [Shuffle]_constant" is also a data tensor.
[10/10/2022-02:31:06] [TRT] [E] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[Reshape_111...Reshape_835]}.)
[10/10/2022-02:31:06] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[!] Invalid Engine. Please ensure the engine was built correctly
[10/10/2022-02:40:17] [I] === Profile (951 iterations ) ===
[10/10/2022-02:40:17] [I] Layer Time (ms) Avg. Time (ms) Median Time (ms) Time %
[10/10/2022-02:40:17] [I] {ForeignNode[Reshape_111 + Transpose_112...Reshape_835]} 754.56 0.7934 0.7926 86.6
[10/10/2022-02:40:17] [I] Conv_836 117.01 0.1230 0.1229 13.4
[10/10/2022-02:40:17] [I] Total 871.57 0.9165 0.9156 100.0
[10/10/2022-02:40:17] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8403] # /workspace/work_dir/TensorRT-8.4.3.1/bin/trtexec --onnx=submodel_backbone.onnx --saveEngine=/workspace/work_dir/K-Lane/submodel_backbone_trtexec.trt --workspace=12000 --useCudaGraph --dumpProfile
Which is the inference time?
median GPU compute time.
-> Can you tell me about your environment? I still get an error with trt8.4. I used the offcial docker images, maybe it's due to cuda version.
--best
means enable FP16 and INT8, without this is running in FP32,
TRT 8.2
[10/11/2022-16:29:05] [I] GPU Compute Time: min = 3.6731 ms, max = 3.76428 ms, mean = 3.72586 ms, median = 3.73242 ms, percentile(99%) = 3.7489 ms
trtexec --onnx=/zeroz/temp/2377/0903_p28_t3_seg_simp.onnx --verbose --best
TRT 8.4
[10/11/2022-16:37:47] [I] GPU Compute Time: min = 3.47852 ms, max = 3.57684 ms, mean = 3.51626 ms, median = 3.52051 ms, percentile(99%) = 3.54614 ms
&&&& PASSED TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=/zeroz/temp/2377/0903_p28_t3_seg_simp.onnx --verbose --best
Looks like no regression in TRT 8.4.
I also suspect it's due to the cuda version, TensorRT Version: 8.4.3.1 CUDA Version: 11.6 CUDNN Version: 8.4 NVIDIA Driver Version: 515. Looks like no problem.
@zerollzeng How do I write inference code? I refer to this, right? https://github.com/NVIDIA/TensorRT/blob/main/samples/python/yolov3_onnx/onnx_to_tensorrt.py
Should work. please also refer to the python api doc.
Description
Hi, guys. After converting the onnx to the trt engine with the link below, The inference time is 5ms in trt8.2.3 and 80ms in trt8.4.3.
Environment
TensorRT Version: 8.4.3.1 NVIDIA GPU: 3080Ti NVIDIA Driver Version: 515 CUDA Version: 11.6 CUDNN Version: 8.4 Operating System: ubuntu18.04 Python Version (if applicable): 3.8.8 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.11 Baremetal or Container (if so, version):
Relevant Files
https://github.com/ywfwyht/onnx_model/blob/main/0903_p28_t3_seg_simp.onnx
Steps To Reproduce