在jetson xavier nx上使用trt_int8推理ssdlite_mobilenet_v3模型，推理时间反而比不使用trt_int8更大？？

dengxinlong commented 3 years ago

使用了static/deploy/python/infer.py 不过使用的是static下的。推理模型是ssdlite_mobilenet_v3。说明：推理的模型是经过感知量化训练之后的。但是，没有经过感知量化训练的模型我也测试了，数值有差别，但是trt_int8的推理时间仍然比不使用trt_int8更长。运行输出：

(test) coded@coded-desktop:~/PaddleDetection/static$ python deploy/python/infer.py --model_dir=../../PaddleDetection_old/bestModel/ssdlite_mobilenet_v3_large_fpn/ --image_file=1478896904942573873.jpg  --use_gpu=True --threshold=0.2 --run_mode=trt_fp16
WARNING: AVX is not support on your machine. Hence, no_avx core will be imported, It has much worse preformance than avx core.
/home/coded/.local/virtualenvs/test/lib/python3.6/site-packages/paddle/utils/cpp_extension/extension_utils.py:461: UserWarning: Not found CUDA runtime, please use `export CUDA_HOME= XXX` to specific it.
  "Not found CUDA runtime, please use `export CUDA_HOME= XXX` to specific it."
-----------  Running Arguments -----------
camera_id: -1
image_file: 1478896904942573873.jpg
model_dir: ../../PaddleDetection_old/bestModel/ssdlite_mobilenet_v3_large_fpn/
output_dir: output
run_benchmark: False
run_mode: trt_fp16
threshold: 0.2
use_gpu: True
video_file: 
------------------------------------------
-----------  Model Configuration -----------
Model Arch: SSD
Use Paddle Executor: False
Transform Order: 
--transform op: Resize
--transform op: Normalize
--transform op: Permute
--------------------------------------------
W0513 20:54:04.807541 18561 analysis_predictor.cc:1145] Deprecated. Please use CreatePredictor instead.
Inference: 57.614803314208984 ms per batch image
class_id:1, confidence:0.4840,left_top:[1101.23,423.78], right_bottom:[1142.89,499.94]
class_id:4, confidence:0.6002,left_top:[1218.22,574.18], right_bottom:[1377.13,638.06]
class_id:4, confidence:0.5065,left_top:[359.96,611.97], right_bottom:[460.14,656.98]
class_id:4, confidence:0.3243,left_top:[650.40,604.32], right_bottom:[697.91,646.41]
class_id:4, confidence:0.2525,left_top:[746.66,601.77], right_bottom:[787.74,639.54]
save result to: output/1478896904942573873.jpg

上面是trt_fp16，但模型是经过感知量化之后的。

下面是trt_int8的输出：

(test) coded@coded-desktop:~/PaddleDetection/static$ python deploy/python/infer.py --model_dir=../../PaddleDetection_old/bestModel/ssdlite_mobilenet_v3_large_fpn/ --image_file=1478896904942573873.jpg  --use_gpu=True --threshold=0.2 --run_mode=trt_int8
WARNING: AVX is not support on your machine. Hence, no_avx core will be imported, It has much worse preformance than avx core.
/home/coded/.local/virtualenvs/test/lib/python3.6/site-packages/paddle/utils/cpp_extension/extension_utils.py:461: UserWarning: Not found CUDA runtime, please use `export CUDA_HOME= XXX` to specific it.
  "Not found CUDA runtime, please use `export CUDA_HOME= XXX` to specific it."
-----------  Running Arguments -----------
camera_id: -1
image_file: 1478896904942573873.jpg
model_dir: ../../PaddleDetection_old/bestModel/ssdlite_mobilenet_v3_large_fpn/
output_dir: output
run_benchmark: False
run_mode: trt_int8
threshold: 0.2
use_gpu: True
video_file: 
------------------------------------------
-----------  Model Configuration -----------
Model Arch: SSD
Use Paddle Executor: False
Transform Order: 
--transform op: Resize
--transform op: Normalize
--transform op: Permute
--------------------------------------------
W0513 19:59:15.472364 18171 analysis_predictor.cc:1145] Deprecated. Please use CreatePredictor instead.
Inference: 33687.761545181274 ms per batch image
class_id:1, confidence:0.4905,left_top:[1101.20,423.82], right_bottom:[1142.83,499.95]
class_id:4, confidence:0.6004,left_top:[1218.18,574.16], right_bottom:[1377.05,638.02]
class_id:4, confidence:0.5076,left_top:[359.73,611.98], right_bottom:[459.73,656.93]
class_id:4, confidence:0.3248,left_top:[650.41,604.29], right_bottom:[697.92,646.41]
class_id:4, confidence:0.2520,left_top:[746.22,601.75], right_bottom:[787.28,639.51]
save result to: output/1478896904942573873.jpg

trt_fp16的推理时间为57.6ms，而trt_int8的推理时间为30000多ms，这实在不合理啊。问题：

两者差别太大，并且我的模型是经过感知量化之后的，为什么？在对模型训练完之后，使用tools/eval.py进行评估的时候，fps在30多，但这里推理的时间是不是太长了，不管是哪种方式，是不是计算方式不一致？环境：jetson xavier nx，测试模型：ssdlite-mobilenetv3_large_fpn，数据集：自定义数据集

yghstill commented 3 years ago

@dengxinlong 你将use_calib_mode 设为False再测试下呢，use_calib_mode只在trt离线量化才会有效，否则会影响预测速度，这个问题在develop分支的代码下修复了：https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/static/deploy/python/infer.py#L420

dengxinlong commented 3 years ago

@dengxinlong 你将use_calib_mode 设为False再测试下呢，use_calib_mode只在trt离线量化才会有效，否则会影响预测速度，这个问题在develop分支的代码下修复了：https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/static/deploy/python/infer.py#L420

我重新测试例一下，这里使用的是前几天刚clone下来的PaddleDetection release 2.0版本代码，使用static/deploy/python/infer.py来推理，使用的模型就是训练好的ssdlite_mobilenet_v3_large_fpn，没有经过感知量化，就是tools/train.py训练好的模型。说明：测试2226张图片，统计总的时间，然后计算均值，也就是每张图片推理大约需要多少时间。 trt_int8: 29.82ms trt_fp16: 26.64ms trt_fp32: 27.58ms 想请教的问题：效果还是反向的？？

PaddlePaddle / PaddleDetection

在jetson xavier nx上使用trt_int8推理ssdlite_mobilenet_v3模型，推理时间反而比不使用trt_int8更大？？ #3030