Different throughput running deepstream vs trtexec mode

I all, I am running YOLOv3 with DeepStream 5.1, and run the saved optimize engine withtrtexeccommand to double-check I get consistent results; instead, I got different throughput, some ideas about what is going on?

Throughput FPS (avg) | INT8 | BS=1 Running TensorRT engine with DeepStream 5.1: 292 Running TensorRT engine in standalone mode (trtexec):201

Running with DeepStream $ deepstream-app -c deepstream_app_config_yoloV3.txt Performance:

IOU Tracker Init with threshold 0.100000
****PERF:  292.03 (291.97)**
** INFO: <bus_callback:204>: Received EOS. Exiting ...

Running with trtexec command $ /usr/src/tensorrt/bin/trtexec --plugins=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so --loadEngine=model_b1_gpu0_int8.engine --int8 Performance

[03/04/2021-02:07:59] [I] min: 5.7467 ms (end to end 9.50974 ms)
[03/04/2021-02:07:59] [I] max: 9.63377 ms (end to end 18.3497 ms)
[03/04/2021-02:07:59] [I] mean: 5.92314 ms (end to end 9.8676 ms)
[03/04/2021-02:07:59] [I] median: 5.89404 ms (end to end 9.80298 ms)
[03/04/2021-02:07:59] [I] percentile: 6.07446 ms at 99% (end to end 10.1627 ms at 99%)
**[03/04/2021-02:07:59] [I] throughput: 201.199 qps**
[03/04/2021-02:07:59] [I] walltime: 3.01691 s
[03/04/2021-02:07:59] [I] Enqueue Time
[03/04/2021-02:07:59] [I] min: 0.516113 ms
[03/04/2021-02:07:59] [I] max: 0.815536 ms
[03/04/2021-02:07:59] [I] median: 0.520508 ms
[03/04/2021-02:07:59] [I] GPU Compute
[03/04/2021-02:07:59] [I] min: 4.7764 ms
[03/04/2021-02:07:59] [I] max: 8.66505 ms
[03/04/2021-02:07:59] [I] mean: 4.95182 ms
[03/04/2021-02:07:59] [I] median: 4.92303 ms
[03/04/2021-02:07:59] [I] percentile: 5.10498 ms at 99%

NVIDIA-AI-IOT / yolo_deepstream

Different throughput running deepstream vs trtexec mode #10