PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Apache License 2.0
12.65k stars 2.87k forks source link

SOLOv2: TensorRT inference: no speed gain #5282

Open mokadevcloud opened 2 years ago

mokadevcloud commented 2 years ago

Hi, We are trying to use TensorRT to speed up inference. In particular, we are using DetectorSOLOv2, and installed a version of paddle-paddle-gpu compiled with TensorRT. However, the inference speed remains about the same whether we try run_mode='paddle', run_mode='trt_fp32', or run_mode='trt_fp16'. This is in contrast with the paddle-ocr repo, where we do observe a big increase in speed, especially with fp16.

Any idea what could be the reason? Any help is appreciated, thanks. Relevant code:

    pred_config = PredictConfig(Solov2Config.model_dir)
    detector = DetectorSOLOv2(
        pred_config,
        Solov2Config.model_dir,
        device='GPU',
        run_mode='trt_fp32',
        batch_size=1,
        trt_min_shape=720,
        trt_max_shape=1920,
        trt_opt_shape=1080,
        trt_calib_mode=True,
        cpu_threads=2,
        enable_mkldnn=True)
lyuwenyu commented 2 years ago

can u show the runtime log w and w/o trt

I think u should try trt_fp16

        device='GPU',
        run_mode='trt_fp32',
        batch_size=1,
yghstill commented 2 years ago

@mokadevcloud There was a problem with TRT speed, and it has been fixed. You could try to set use_dynamic_shape=True, and TensorRT > 7.1.3.

mokadevcloud commented 2 years ago

Hi @yghstill Just tried again, still no difference. I'm on branch release/2.3. Should I try develop? Thanks

mokadevcloud commented 2 years ago

Tried develop as well. No difference so far. Here are the logs from run_benchmark=True on a 1280 x 720p image:

run_mode=paddle:

2022-03-04 13:39:35,465 - benchmark_utils - INFO - ---------------------- Paddle info ----------------------
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] paddle_version: 2.2.2
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] paddle_commit: b031c389938bfa15e15bb20494c76f86289d77b0
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] paddle_branch: HEAD
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] log_api_version: 1.0.3
2022-03-04 13:39:35,465 - benchmark_utils - INFO - ----------------------- Conf info -----------------------
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] runtime_device: gpu
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] ir_optim: True
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] enable_memory_optim: True
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] enable_tensorrt: False
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] enable_mkldnn: False
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] cpu_math_library_num_threads: 1
2022-03-04 13:39:35,465 - benchmark_utils - INFO - ----------------------- Model info ----------------------
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] model_name: solov2_r101_vd_fpn_3x_coco
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] precision: paddle
2022-03-04 13:39:35,465 - benchmark_utils - INFO - ----------------------- Data info -----------------------
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] batch_size: 1
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] input_shape: dynamic_shape
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] data_num: 1
2022-03-04 13:39:35,465 - benchmark_utils - INFO - ----------------------- Perf info -----------------------
2022-03-04 13:39:35,465 - benchmark_utils - INFO - [DET] cpu_rss(MB): 4505, cpu_vms: 0, cpu_shared_mb: 0, cpu_dirty_mb: 0, cpu_util: 0%
2022-03-04 13:39:35,466 - benchmark_utils - INFO - [DET] gpu_rss(MB): 5003, gpu_util: 75.0%, gpu_mem_util: 0%
2022-03-04 13:39:35,466 - benchmark_utils - INFO - [DET] total time spent(s): 0.1627
2022-03-04 13:39:35,466 - benchmark_utils - INFO - [DET] preprocess_time(ms): 71.2, inference_time(ms): 91.5, postprocess_time(ms): 0.0, tracking_time(ms): 0.0

run_mode=trt_fp16:

2022-03-04 13:40:27,863 - benchmark_utils - INFO - ---------------------- Paddle info ----------------------
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] paddle_version: 2.2.2
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] paddle_commit: b031c389938bfa15e15bb20494c76f86289d77b0
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] paddle_branch: HEAD
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] log_api_version: 1.0.3
2022-03-04 13:40:27,863 - benchmark_utils - INFO - ----------------------- Conf info -----------------------
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] runtime_device: gpu
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] ir_optim: True
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] enable_memory_optim: True
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] enable_tensorrt: True
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] enable_mkldnn: False
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] cpu_math_library_num_threads: 1
2022-03-04 13:40:27,863 - benchmark_utils - INFO - ----------------------- Model info ----------------------
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] model_name: solov2_r101_vd_fpn_3x_coco
2022-03-04 13:40:27,863 - benchmark_utils - INFO - [DET] precision: fp16
2022-03-04 13:40:27,863 - benchmark_utils - INFO - ----------------------- Data info -----------------------
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] batch_size: 1
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] input_shape: dynamic_shape
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] data_num: 1
2022-03-04 13:40:27,864 - benchmark_utils - INFO - ----------------------- Perf info -----------------------
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] cpu_rss(MB): 4514, cpu_vms: 0, cpu_shared_mb: 0, cpu_dirty_mb: 0, cpu_util: 0%
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] gpu_rss(MB): 5001, gpu_util: 79.0%, gpu_mem_util: 0%
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] total time spent(s): 0.1663
2022-03-04 13:40:27,864 - benchmark_utils - INFO - [DET] preprocess_time(ms): 73.0, inference_time(ms): 93.3, postprocess_time(ms): 0.0, tracking_time(ms): 0.0

We are using TensorRT 8.2.3:

ii  libnvinfer-bin                                              8.2.3-1+cuda11.4                                       amd64        TensorRT binaries
ii  libnvinfer-dev                                              8.2.3-1+cuda11.4                                       amd64        TensorRT development libraries and headers
ii  libnvinfer-doc                                              8.2.3-1+cuda11.4                                       all          TensorRT documentation
ii  libnvinfer-plugin-dev                                       8.2.3-1+cuda11.4                                       amd64        TensorRT plugin libraries
ii  libnvinfer-plugin8                                          8.2.3-1+cuda11.4                                       amd64        TensorRT plugin libraries
ii  libnvinfer-samples                                          8.2.3-1+cuda11.4                                       all          TensorRT samples
ii  libnvinfer8                                                 8.2.3-1+cuda11.4                                       amd64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                                        8.2.3-1+cuda11.4                                       amd64        TensorRT ONNX libraries
ii  libnvonnxparsers8                                           8.2.3-1+cuda11.4                                       amd64        TensorRT ONNX libraries
ii  libnvparsers-dev                                            8.2.3-1+cuda11.4                                       amd64        TensorRT parsers libraries
ii  libnvparsers8                                               8.2.3-1+cuda11.4                                       amd64        TensorRT parsers libraries
ii  tensorrt                                                    8.2.3.0-1+cuda11.4                                     amd64        Meta package of TensorRT
moonnyeon commented 1 year ago

@mokadevcloud i have same issue did you solve this ?

mokadevcloud commented 1 year ago

Hi there, No we couldn't fix this last year, and haven't tried this again as we moved on to other things. Good luck!