Open mokadevcloud opened 2 years ago
can u show the runtime log w and w/o trt
I think u should try trt_fp16
device='GPU',
run_mode='trt_fp32',
batch_size=1,
@mokadevcloud There was a problem with TRT speed, and it has been fixed. You could try to set use_dynamic_shape=True, and TensorRT > 7.1.3.
Hi @yghstill Just tried again, still no difference. I'm on branch release/2.3. Should I try develop? Thanks
Tried develop as well. No difference so far. Here are the logs from run_benchmark=True
on a 1280 x 720p image:
run_mode=paddle:
2022-03-04 13:39:35,465 - benchmark_utils - INFO - ---------------------- Paddle info ----------------------
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] paddle_version: 2.2.2
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] paddle_commit: b031c389938bfa15e15bb20494c76f86289d77b0
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] paddle_branch: HEAD
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] log_api_version: 1.0.3
2022-03-04 13:39:35,465 - benchmark_utils - INFO - ----------------------- Conf info -----------------------
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] runtime_device: gpu
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] ir_optim: True
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] enable_memory_optim: True
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] enable_tensorrt: False
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] enable_mkldnn: False
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] cpu_math_library_num_threads: 1
2022-03-04 13:39:35,465 - benchmark_utils - INFO - ----------------------- Model info ----------------------
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] model_name: solov2_r101_vd_fpn_3x_coco
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] precision: paddle
2022-03-04 13:39:35,465 - benchmark_utils - INFO - ----------------------- Data info -----------------------
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] batch_size: 1
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] input_shape: dynamic_shape
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] data_num: 1
2022-03-04 13:39:35,465 - benchmark_utils - INFO - ----------------------- Perf info -----------------------
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] cpu_rss(MB): 4505, cpu_vms: 0, cpu_shared_mb: 0, cpu_dirty_mb: 0, cpu_util: 0%
2022-03-04 13:39:35,466 - benchmark_utils - INFO - [DET] gpu_rss(MB): 5003, gpu_util: 75.0%, gpu_mem_util: 0%
2022-03-04 13:39:35,466 - benchmark_utils - INFO - [DET] total time spent(s): 0.1627
2022-03-04 13:39:35,466 - benchmark_utils - INFO - [DET] preprocess_time(ms): 71.2, inference_time(ms): 91.5, postprocess_time(ms): 0.0, tracking_time(ms): 0.0
run_mode=trt_fp16:
2022-03-04 13:40:27,863 - benchmark_utils - INFO - ---------------------- Paddle info ----------------------
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] paddle_version: 2.2.2
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] paddle_commit: b031c389938bfa15e15bb20494c76f86289d77b0
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] paddle_branch: HEAD
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] log_api_version: 1.0.3
2022-03-04 13:40:27,863 - benchmark_utils - INFO - ----------------------- Conf info -----------------------
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] runtime_device: gpu
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] ir_optim: True
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] enable_memory_optim: True
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] enable_tensorrt: True
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] enable_mkldnn: False
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] cpu_math_library_num_threads: 1
2022-03-04 13:40:27,863 - benchmark_utils - INFO - ----------------------- Model info ----------------------
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] model_name: solov2_r101_vd_fpn_3x_coco
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] precision: fp16
2022-03-04 13:40:27,863 - benchmark_utils - INFO - ----------------------- Data info -----------------------
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] batch_size: 1
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] input_shape: dynamic_shape
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] data_num: 1
2022-03-04 13:40:27,864 - benchmark_utils - INFO - ----------------------- Perf info -----------------------
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] cpu_rss(MB): 4514, cpu_vms: 0, cpu_shared_mb: 0, cpu_dirty_mb: 0, cpu_util: 0%
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] gpu_rss(MB): 5001, gpu_util: 79.0%, gpu_mem_util: 0%
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] total time spent(s): 0.1663
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] preprocess_time(ms): 73.0, inference_time(ms): 93.3, postprocess_time(ms): 0.0, tracking_time(ms): 0.0
We are using TensorRT 8.2.3:
ii libnvinfer-bin 8.2.3-1+cuda11.4 amd64 TensorRT binaries
ii libnvinfer-dev 8.2.3-1+cuda11.4 amd64 TensorRT development libraries and headers
ii libnvinfer-doc 8.2.3-1+cuda11.4 all TensorRT documentation
ii libnvinfer-plugin-dev 8.2.3-1+cuda11.4 amd64 TensorRT plugin libraries
ii libnvinfer-plugin8 8.2.3-1+cuda11.4 amd64 TensorRT plugin libraries
ii libnvinfer-samples 8.2.3-1+cuda11.4 all TensorRT samples
ii libnvinfer8 8.2.3-1+cuda11.4 amd64 TensorRT runtime libraries
ii libnvonnxparsers-dev 8.2.3-1+cuda11.4 amd64 TensorRT ONNX libraries
ii libnvonnxparsers8 8.2.3-1+cuda11.4 amd64 TensorRT ONNX libraries
ii libnvparsers-dev 8.2.3-1+cuda11.4 amd64 TensorRT parsers libraries
ii libnvparsers8 8.2.3-1+cuda11.4 amd64 TensorRT parsers libraries
ii tensorrt 8.2.3.0-1+cuda11.4 amd64 Meta package of TensorRT
@mokadevcloud i have same issue did you solve this ?
Hi there, No we couldn't fix this last year, and haven't tried this again as we moved on to other things. Good luck!
Hi, We are trying to use TensorRT to speed up inference. In particular, we are using DetectorSOLOv2, and installed a version of paddle-paddle-gpu compiled with TensorRT. However, the inference speed remains about the same whether we try
run_mode='paddle'
,run_mode='trt_fp32'
, orrun_mode='trt_fp16'
. This is in contrast with the paddle-ocr repo, where we do observe a big increase in speed, especially with fp16.Any idea what could be the reason? Any help is appreciated, thanks. Relevant code: