PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Apache License 2.0
12.75k stars 2.88k forks source link

在jetson xavier nx上使用trt_int8推理ssdlite_mobilenet_v3模型,推理时间反而比不使用trt_int8更大?? #3030

Open dengxinlong opened 3 years ago

dengxinlong commented 3 years ago

使用了static/deploy/python/infer.py 不过使用的是static下的。推理模型是ssdlite_mobilenet_v3。 说明:推理的模型是经过感知量化训练之后的。但是,没有经过感知量化训练的模型我也测试了,数值有差别,但是trt_int8的推理时间仍然比不使用trt_int8更长。 运行输出:

(test) coded@coded-desktop:~/PaddleDetection/static$ python deploy/python/infer.py --model_dir=../../PaddleDetection_old/bestModel/ssdlite_mobilenet_v3_large_fpn/ --image_file=1478896904942573873.jpg  --use_gpu=True --threshold=0.2 --run_mode=trt_fp16
WARNING: AVX is not support on your machine. Hence, no_avx core will be imported, It has much worse preformance than avx core.
/home/coded/.local/virtualenvs/test/lib/python3.6/site-packages/paddle/utils/cpp_extension/extension_utils.py:461: UserWarning: Not found CUDA runtime, please use `export CUDA_HOME= XXX` to specific it.
  "Not found CUDA runtime, please use `export CUDA_HOME= XXX` to specific it."
-----------  Running Arguments -----------
camera_id: -1
image_file: 1478896904942573873.jpg
model_dir: ../../PaddleDetection_old/bestModel/ssdlite_mobilenet_v3_large_fpn/
output_dir: output
run_benchmark: False
run_mode: trt_fp16
threshold: 0.2
use_gpu: True
video_file: 
------------------------------------------
-----------  Model Configuration -----------
Model Arch: SSD
Use Paddle Executor: False
Transform Order: 
--transform op: Resize
--transform op: Normalize
--transform op: Permute
--------------------------------------------
W0513 20:54:04.807541 18561 analysis_predictor.cc:1145] Deprecated. Please use CreatePredictor instead.
Inference: 57.614803314208984 ms per batch image
class_id:1, confidence:0.4840,left_top:[1101.23,423.78], right_bottom:[1142.89,499.94]
class_id:4, confidence:0.6002,left_top:[1218.22,574.18], right_bottom:[1377.13,638.06]
class_id:4, confidence:0.5065,left_top:[359.96,611.97], right_bottom:[460.14,656.98]
class_id:4, confidence:0.3243,left_top:[650.40,604.32], right_bottom:[697.91,646.41]
class_id:4, confidence:0.2525,left_top:[746.66,601.77], right_bottom:[787.74,639.54]
save result to: output/1478896904942573873.jpg

上面是trt_fp16,但模型是经过感知量化之后的。

下面是trt_int8的输出:

(test) coded@coded-desktop:~/PaddleDetection/static$ python deploy/python/infer.py --model_dir=../../PaddleDetection_old/bestModel/ssdlite_mobilenet_v3_large_fpn/ --image_file=1478896904942573873.jpg  --use_gpu=True --threshold=0.2 --run_mode=trt_int8
WARNING: AVX is not support on your machine. Hence, no_avx core will be imported, It has much worse preformance than avx core.
/home/coded/.local/virtualenvs/test/lib/python3.6/site-packages/paddle/utils/cpp_extension/extension_utils.py:461: UserWarning: Not found CUDA runtime, please use `export CUDA_HOME= XXX` to specific it.
  "Not found CUDA runtime, please use `export CUDA_HOME= XXX` to specific it."
-----------  Running Arguments -----------
camera_id: -1
image_file: 1478896904942573873.jpg
model_dir: ../../PaddleDetection_old/bestModel/ssdlite_mobilenet_v3_large_fpn/
output_dir: output
run_benchmark: False
run_mode: trt_int8
threshold: 0.2
use_gpu: True
video_file: 
------------------------------------------
-----------  Model Configuration -----------
Model Arch: SSD
Use Paddle Executor: False
Transform Order: 
--transform op: Resize
--transform op: Normalize
--transform op: Permute
--------------------------------------------
W0513 19:59:15.472364 18171 analysis_predictor.cc:1145] Deprecated. Please use CreatePredictor instead.
Inference: 33687.761545181274 ms per batch image
class_id:1, confidence:0.4905,left_top:[1101.20,423.82], right_bottom:[1142.83,499.95]
class_id:4, confidence:0.6004,left_top:[1218.18,574.16], right_bottom:[1377.05,638.02]
class_id:4, confidence:0.5076,left_top:[359.73,611.98], right_bottom:[459.73,656.93]
class_id:4, confidence:0.3248,left_top:[650.41,604.29], right_bottom:[697.92,646.41]
class_id:4, confidence:0.2520,left_top:[746.22,601.75], right_bottom:[787.28,639.51]
save result to: output/1478896904942573873.jpg

trt_fp16的推理时间为57.6ms,而trt_int8的推理时间为30000多ms,这实在不合理啊。 问题:

两者差别太大,并且我的模型是经过感知量化之后的,为什么? 在对模型训练完之后,使用tools/eval.py进行评估的时候,fps在30多,但这里推理的时间是不是太长了,不管是哪种方式,是不是计算方式不一致? 环境:jetson xavier nx,测试模型:ssdlite-mobilenetv3_large_fpn,数据集:自定义数据集

yghstill commented 3 years ago

@dengxinlong 你将use_calib_mode 设为False再测试下呢,use_calib_mode只在trt离线量化才会有效,否则会影响预测速度,这个问题在develop分支的代码下修复了:https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/static/deploy/python/infer.py#L420

dengxinlong commented 3 years ago

@dengxinlong 你将use_calib_mode 设为False再测试下呢,use_calib_mode只在trt离线量化才会有效,否则会影响预测速度,这个问题在develop分支的代码下修复了:https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/static/deploy/python/infer.py#L420

我重新测试例一下,这里使用的是前几天刚clone下来的PaddleDetection release 2.0版本代码,使用static/deploy/python/infer.py来推理,使用的模型就是训练好的ssdlite_mobilenet_v3_large_fpn,没有经过感知量化,就是tools/train.py训练好的模型。 说明:测试2226张图片,统计总的时间,然后计算均值,也就是每张图片推理大约需要多少时间。 trt_int8: 29.82ms trt_fp16: 26.64ms trt_fp32: 27.58ms 想请教的问题: 效果还是反向的??