Open tsaizhenling opened 2 months ago
It should be a bug, try to use TensorRT v8.6 . @tsaizhenling
On my machine, it build info:
[10/03/2024-10:47:59] [I] === Performance summary ===
[10/03/2024-10:47:59] [I] Throughput: 130.298 qps
[10/03/2024-10:47:59] [I] Latency: min = 5.38672 ms, max = 13.3664 ms, mean = 7.60628 ms, median = 7.16092 ms, percentile(90%) = 10.2996 ms, percentile(95%) = 10.8698 ms, percentile(99%) = 12.9675 ms
[10/03/2024-10:47:59] [I] Enqueue Time: min = 2.72876 ms, max = 15.0201 ms, mean = 7.63062 ms, median = 7.86694 ms, percentile(90%) = 10.1792 ms, percentile(95%) = 11.7217 ms, percentile(99%) = 14.1848 ms
[10/03/2024-10:47:59] [I] H2D Latency: min = 0.0065918 ms, max = 0.0698242 ms, mean = 0.0122212 ms, median = 0.00814819 ms, percentile(90%) = 0.0253906 ms, percentile(95%) = 0.0270996 ms, percentile(99%) = 0.0515137 ms
[10/03/2024-10:47:59] [I] GPU Compute Time: min = 5.34595 ms, max = 13.3388 ms, mean = 7.58404 ms, median = 7.14139 ms, percentile(90%) = 10.2881 ms, percentile(95%) = 10.8175 ms, percentile(99%) = 12.9485 ms
[10/03/2024-10:47:59] [I] D2H Latency: min = 0.00390625 ms, max = 0.110596 ms, mean = 0.0100141 ms, median = 0.00439453 ms, percentile(90%) = 0.0254517 ms, percentile(95%) = 0.0319824 ms, percentile(99%) = 0.0994873 ms
[10/03/2024-10:47:59] [I] Total Host Walltime: 3.00082 s
[10/03/2024-10:47:59] [I] Total GPU Compute Time: 2.96536 s
[10/03/2024-10:47:59] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[10/03/2024-10:47:59] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[10/03/2024-10:47:59] [W] * GPU compute time is unstable, with coefficient of variance = 26.7909%.
[10/03/2024-10:47:59] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[10/03/2024-10:47:59] [I] Explanations of the performance metrics are printed in the verbose logs.
[10/03/2024-10:47:59] [V]
[10/03/2024-10:47:59] [V] === Explanations of the performance metrics ===
[10/03/2024-10:47:59] [V] Total Host Walltime: the host walltime from when the first query (after warmups) is enqueued to when the last query is completed.
[10/03/2024-10:47:59] [V] GPU Compute Time: the GPU latency to execute the kernels for a query.
[10/03/2024-10:47:59] [V] Total GPU Compute Time: the summation of the GPU Compute Time of all the queries. If this is significantly shorter than Total Host Walltime, the GPU may be under-utilized because of host-side overheads or data transfers.
[10/03/2024-10:47:59] [V] Throughput: the observed throughput computed by dividing the number of queries by the Total Host Walltime. If this is significantly lower than the reciprocal of GPU Compute Time, the GPU may be under-utilized because of host-side overheads or data transfers.
[10/03/2024-10:47:59] [V] Enqueue Time: the host latency to enqueue a query. If this is longer than GPU Compute Time, the GPU may be under-utilized.
[10/03/2024-10:47:59] [V] H2D Latency: the latency for host-to-device data transfers for input tensors of a single query.
[10/03/2024-10:47:59] [V] D2H Latency: the latency for device-to-host data transfers for output tensors of a single query.
[10/03/2024-10:47:59] [V] Latency: the summation of H2D Latency, GPU Compute Time, and D2H Latency. This is the latency to infer a single query.
[10/03/2024-10:47:59] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8601] # trtexec --verbose --onnx=./parseq_recognizer_fix_dynamicbatch.onnx
Description
onnx to trt conversion fails for model with dynamic batch
Environment
TensorRT Version: 8.5.2.2
NVIDIA GPU: xavier nx
CUDA Version:12.2
Operating System: ubuntu 20.04
Python Version (if applicable): 3.12.4
Relevant Files
Model link: https://drive.google.com/file/d/13l9CUXUJOiHfth-ryRFtxuq7vlpm1Kur/view?usp=sharing
Steps To Reproduce
error: