Open decadance-dance opened 2 months ago
Try to add --builderOptimizationLevel=5
.
@lix19937 I added this flag but got:
[08/01/2024-08:23:19] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 68727865856 detected for tactic 0x0000000000000018.
[08/01/2024-08:23:20] [W] [TRT] Tactic Device request: 65544MB Available: 45525MB. Device memory is insufficient to use tactic.
[08/01/2024-08:23:20] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 1 due to insufficient memory on requested size of 68727865856 detected for tactic 0x0000000000000019.
[08/01/2024-08:23:20] [W] [TRT] Tactic Device request: 65544MB Available: 45525MB. Device memory is insufficient to use tactic.
[08/01/2024-08:23:20] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 2 due to insufficient memory on requested size of 68727865856 detected for tactic 0x000000000000001a.
[08/01/2024-08:23:20] [W] [TRT] Tactic Device request: 65544MB Available: 45525MB. Device memory is insufficient to use tactic.
[08/01/2024-08:23:20] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 3 due to insufficient memory on requested size of 68727865856 detected for tactic 0x000000000000001b.
[08/01/2024-08:23:20] [W] [TRT] Tactic Device request: 65544MB Available: 45525MB. Device memory is insufficient to use tactic.
[08/01/2024-08:23:20] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 4 due to insufficient memory on requested size of 68727865856 detected for tactic 0x000000000000001f.
Why 45GB VRAM is insufficient?
@lix19937
despite the issues associated with GPU memory, I rebuilt the model with --builderOptimizationLevel=5
but got quite close results:
Inferences/Second vs. Client p95 Batch Latency
Concurrency: 1, throughput: 12.5453 infer/sec, latency 81882 usec
Concurrency: 2, throughput: 24.5096 infer/sec, latency 95886 usec
Concurrency: 3, throughput: 28.1778 infer/sec, latency 109527 usec
Concurrency: 4, throughput: 29.4522 infer/sec, latency 168369 usec
So I think either it didn't work at all or the presence of issues affected the result.
Why 45GB VRAM is insufficient?
Yes , You can try to add workspace size.
BTW, diff hardwares with A30 A40, you should keep the freq stable, and compare the power supply, and use nsight system tools to profile the resource utilize.
Description
I am moving from A30 to A40. So I needed to rebuild my onnx model for A40. I rebuilt using the same trtexec version, the same command and the same model via the docker image as I did on A30. The image:
nvcr.io/nvidia/tensorrt:24.06-py3
The command:I benchmark my models on both GPUs using Triton Inference Server 2.47.0 and get: A30:
A40:
Environment
TensorRT Version: 10.1.0.27
NVIDIA GPU: A40
NVIDIA Driver Version: 555.58.02
CUDA Version: 12.1
Operating System: Ubuntu 22.04