NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.72k stars 2.12k forks source link

injectImplicitPadding.cpp::makeMaskTensors::412] Error #2406

Closed frankxyy closed 1 year ago

frankxyy commented 2 years ago

I use tensorrt to optimize an onnx file. An error occurred: image

I cannot find info about this error on the Internet. Real appreciate if if you can make any help about this error. Thanks a lot.

frankxyy commented 2 years ago

I use trtexec to convert and my trtexec commad is:

trtexec --onnx=ddetr_surgeoned_huajian_postprocessed.onnx --workspace=18000 --explicitBatch --minShapes=input:1x3x512x512 --optShapes=input:1x3x768x768 --maxShapes=input:1x3x1024x1024 --shapes=input:1x3x768x768 --saveEngine=ddetr_surgeoned_huajian_postprocessed.plan --fp16 --verbose --plugins=/trt_optimize/mmcv/mmcv/ops/csrc/tensorrt/grid_sampler/GridSampler.so --plugins=/trt_optimize/ocr-engine/Serving/ONNX/rec_res18_transformer/LayerNormPlugin/LayerNorm.so
frankxyy commented 2 years ago

When I use tensorrt python api to convert, the error below appeared: [10/19/2022-16:07:47] [TRT] [E] [shapeContext.cpp::foldCastFToI::1399] Error Code 2: Internal Error (Assertion !std::isnan(x) failed. )

zerollzeng commented 2 years ago

Can you provide the onnx and plugin.so for reproduce?

frankxyy commented 2 years ago

@zerollzeng The onnx file is more than 300MB,which is bigger than the github limit. Could you provide another upload space? Thanks

zerollzeng commented 2 years ago

Can you upload it to google drive? or use git-lfs to upload it to some repo.

frankxyy commented 2 years ago

@zerollzeng hi,I have uploaded the files into the google drive: https://drive.google.com/drive/folders/17wAieyRkQwCd7_KOHSvpwthDBYJC5TAZ?usp=sharing

zerollzeng commented 2 years ago

You model has a fixed input shape, so specifying dynamic shape won't work.

I can see it works in TRT 8.5

[10/27/2022-08:21:33] [I] === Performance summary ===
[10/27/2022-08:21:33] [I] Throughput: 54.0278 qps
[10/27/2022-08:21:33] [I] Latency: min = 18.3606 ms, max = 21.1658 ms, mean = 18.5114 ms, median = 18.4764 ms, percentile(90%) = 18.5352 ms, percentile(95%) = 18.5676 ms, percentile(99%) = 21.117 ms
[10/27/2022-08:21:33] [I] Enqueue Time: min = 18.2969 ms, max = 21.0941 ms, mean = 18.4478 ms, median = 18.4125 ms, percentile(90%) = 18.4719 ms, percentile(95%) = 18.5032 ms, percentile(99%) = 21.0548 ms
[10/27/2022-08:21:33] [I] H2D Latency: min = 0.580536 ms, max = 0.619568 ms, mean = 0.59597 ms, median = 0.583618 ms, percentile(90%) = 0.614014 ms, percentile(95%) = 0.615234 ms, percentile(99%) = 0.618896 ms
[10/27/2022-08:21:33] [I] GPU Compute Time: min = 17.7249 ms, max = 20.5209 ms, mean = 17.8606 ms, median = 17.8259 ms, percentile(90%) = 17.8878 ms, percentile(95%) = 17.9041 ms, percentile(99%) = 20.4831 ms
[10/27/2022-08:21:33] [I] D2H Latency: min = 0.050415 ms, max = 0.0653687 ms, mean = 0.054852 ms, median = 0.0537109 ms, percentile(90%) = 0.0612793 ms, percentile(95%) = 0.0615234 ms, percentile(99%) = 0.0631714 ms
[10/27/2022-08:21:33] [I] Total Host Walltime: 3.03547 s
[10/27/2022-08:21:33] [I] Total GPU Compute Time: 2.92913 s
[10/27/2022-08:21:33] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[10/27/2022-08:21:33] [W]   If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[10/27/2022-08:21:33] [W] * GPU compute time is unstable, with coefficient of variance = 1.66167%.
[10/27/2022-08:21:33] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[10/27/2022-08:21:33] [I] Explanations of the performance metrics are printed in the verbose logs.
[10/27/2022-08:21:33] [V]
[10/27/2022-08:21:33] [V] === Explanations of the performance metrics ===
[10/27/2022-08:21:33] [V] Total Host Walltime: the host walltime from when the first query (after warmups) is enqueued to when the last query is completed.
[10/27/2022-08:21:33] [V] GPU Compute Time: the GPU latency to execute the kernels for a query.
[10/27/2022-08:21:33] [V] Total GPU Compute Time: the summation of the GPU Compute Time of all the queries. If this is significantly shorter than Total Host Walltime, the GPU may be under-utilized because of host-side overheads or data transfers.
[10/27/2022-08:21:33] [V] Throughput: the observed throughput computed by dividing the number of queries by the Total Host Walltime. If this is significantly lower than the reciprocal of GPU Compute Time, the GPU may be under-utilized because of host-side overheads or data transfers.
[10/27/2022-08:21:33] [V] Enqueue Time: the host latency to enqueue a query. If this is longer than GPU Compute Time, the GPU may be under-utilized.
[10/27/2022-08:21:33] [V] H2D Latency: the latency for host-to-device data transfers for input tensors of a single query.
[10/27/2022-08:21:33] [V] D2H Latency: the latency for device-to-host data transfers for output tensors of a single query.
[10/27/2022-08:21:33] [V] Latency: the summation of H2D Latency, GPU Compute Time, and D2H Latency. This is the latency to infer a single query.
[10/27/2022-08:21:33] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8501] # trtexec --onnx=ddetr_surgeoned_huajian_postprocessed.onnx --fp16 --verbose --plugins=./GridSampler.so --plugins=./LayerNorm.so
frankxyy commented 2 years ago

Sorry, I misupload the wrong model with the static shape. I will check and upload the model with dynamic shape later

frankxyy commented 2 years ago

Hi @zerollzeng , I have uploaded the exact onnx file named ddetr_dynamic.onnx to the google drive: https://drive.google.com/drive/folders/17wAieyRkQwCd7_KOHSvpwthDBYJC5TAZ?usp=sharing。 I have checked that this one is the exact dynamic one which caused the referred error above. Very sorry for the inconvenience brought by my carelessness.

zerollzeng commented 1 year ago

Did you try it with the latest 8.5? we are about to release 8.5 GA. or you can take a quick try with our docker image 22.09 which has an EA version.

frankxyy commented 1 year ago

hi @zerollzeng, as I run trtexec in image 22.09,the error information changed, which shows:

[10/31/2022-04:31:31] [E] Error[2]: [optimizer.cpp::getFormatRequirements::3015] Error Code 2: Internal Error (Assertion !n->candidateRequirements.empty() failed. no supported formats)
[10/31/2022-04:31:31] [E] Error[2]: [builder.cpp::buildSerializedNetwork::738] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
zerollzeng commented 1 year ago

Can you try the 8.5 GA? it has been released. it can pass on my env:

&&&& PASSED TensorRT.trtexec [TensorRT v8501] # trtexec --onnx=ddetr_dynamic.onnx --minShapes=input:1x3x512x512 --optShapes=input:1x3x768x768 --maxShapes=input:1x3x1024x1024 --fp16 --verbose --plugins=./GridSampler.so --plugins=./LayerNorm.so
frankxyy commented 1 year ago

@zerollzeng When will the docker image for 8.5 GA published?

zerollzeng commented 1 year ago

Oh looks like the 22.10 does not contains 8.5 GA, but I think you can download the release package in use in 22.10 containers.

nvpohanh commented 1 year ago

22.11 container has TRT 8.5 GA.

ttyio commented 1 year ago

Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!