关于量化，量化之后测试出来的帧率反而更低了？？

dengxinlong commented 3 years ago

上面是采用感知量化训练出来的结果。命令为： python slim/quantization/eval.py -c configs/ssd/ssdlite_mobilenet_v3_large_fpn.yml -o weights=experiment/best_model/

这个是没有量化之后的结果。

可以看见，采用感知量化的fps：27.16，没有量化之后的fps：35.34，帧率反而更低了。是我量化的方式不对吗？？

yghstill commented 3 years ago

@dengxinlong 你直接eval，模型的精度还是fp32，是没有加速的，并且量化后精度下降，会导致nms耗时增加，预测速度反而降低了，量化后模型测速，建议你使用paddle-lite或tensorRT int预测：

PaddleLite：demo：https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/deploy/lite 由于目前只适配了yolo系列，ssd系列cpp文件可参考：https://github.com/PaddlePaddle/Paddle-Lite/tree/develop/lite/demo/cxx/ssd_detection
trt int8：https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/deploy/python 指定run_mode=trt_int8即可

dengxinlong commented 3 years ago

@dengxinlong 你直接eval，模型的精度还是fp32，是没有加速的，并且量化后精度下降，会导致nms耗时增加，预测速度反而降低了，量化后模型测速，建议你使用paddle-lite或tensorRT int预测：

PaddleLite：demo：https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/deploy/lite 由于目前只适配了yolo系列，ssd系列cpp文件可参考：https://github.com/PaddlePaddle/Paddle-Lite/tree/develop/lite/demo/cxx/ssd_detection

trt int8：https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/deploy/python 指定run_mode=trt_int8即可

这里的话我主要是想在jetson xavier nx上做量化，但是你们文档好像是针对android的啊

yghstill commented 3 years ago

@dengxinlong jetson上使用trt int8模式预测即可

dengxinlong commented 3 years ago

@dengxinlong jetson上使用trt int8模式预测即可

感谢您的回复，我看您给的文档中的是infer.py，但我想运行的是eval

dengxinlong commented 3 years ago

@dengxinlong jetson上使用trt int8模式预测即可

同时，主要是我想使用量化来测试模型，就是不知道paddlelite是否支持jetson xavier nx？？

dengxinlong commented 3 years ago

@dengxinlong jetson上使用trt int8模式预测即可

我在jetson xavier nx上编译安装paddlelite是出现了问题。是不是说paddlelite无法在jetson xavier nx上安装啊？报错信息： unrecognized command line option '-m16'

dengxinlong commented 3 years ago

@dengxinlong 你直接eval，模型的精度还是fp32，是没有加速的，并且量化后精度下降，会导致nms耗时增加，预测速度反而降低了，量化后模型测速，建议你使用paddle-lite或tensorRT int预测：

PaddleLite：demo：https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/deploy/lite 由于目前只适配了yolo系列，ssd系列cpp文件可参考：https://github.com/PaddlePaddle/Paddle-Lite/tree/develop/lite/demo/cxx/ssd_detection

trt int8：https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/deploy/python 指定run_mode=trt_int8即可

您说的在jetson xavier nx上直接使用tensorRT，但是我导出模型进行预测后，报错说不支持tensorRT int8。

Traceback (most recent call last):
  File "deploy/python/infer.py", line 601, in <module>
    main()
  File "deploy/python/infer.py", line 536, in main
    config, FLAGS.model_dir, use_gpu=FLAGS.use_gpu, run_mode=FLAGS.run_mode)
  File "deploy/python/infer.py", line 78, in __init__
    use_gpu=use_gpu)
  File "deploy/python/infer.py", line 397, in load_predictor
    raise ValueError("TensorRT int8 mode is not supported now, "
ValueError: TensorRT int8 mode is not supported now, please use trt_fp32 or trt_fp16 instead.

yghstill commented 3 years ago

@dengxinlong 请使用release/2.0或develop分支最新代码：https://github.com/PaddlePaddle/PaddleDetection/blob/develop/static/deploy/python/infer.py

dengxinlong commented 3 years ago

@dengxinlong 请使用release/2.0或develop分支最新代码：https://github.com/PaddlePaddle/PaddleDetection/blob/develop/static/deploy/python/infer.py

在release/2.0上能不能使用 release 2.0-rc上的配置文件(.yml文件)啊？？

dengxinlong commented 3 years ago

@dengxinlong 请使用release/2.0或develop分支最新代码：https://github.com/PaddlePaddle/PaddleDetection/blob/develop/static/deploy/python/infer.py

有个重要的问题就是，能否用release/2.0中静态图的代码来使用 tensorRT int8？？

yghstill commented 3 years ago

@dengxinlong 请使用release/2.0或develop分支最新代码：https://github.com/PaddlePaddle/PaddleDetection/blob/develop/static/deploy/python/infer.py

有个重要的问题就是，能否用release/2.0中静态图的代码来使用 tensorRT int8？？

静态图代码可以使用 TensorRT int8

dengxinlong commented 3 years ago

@dengxinlong 你直接eval，模型的精度还是fp32，是没有加速的，并且量化后精度下降，会导致nms耗时增加，预测速度反而降低了，量化后模型测速，建议你使用paddle-lite或tensorRT int预测：

PaddleLite：demo：https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/deploy/lite 由于目前只适配了yolo系列，ssd系列cpp文件可参考：https://github.com/PaddlePaddle/Paddle-Lite/tree/develop/lite/demo/cxx/ssd_detection

trt int8：https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/deploy/python 指定run_mode=trt_int8即可

哎，你们这边的文档能不能稍微写清楚一点啊，感觉很乱！！！！！！！

dengxinlong commented 3 years ago

@dengxinlong 请使用release/2.0或develop分支最新代码：https://github.com/PaddlePaddle/PaddleDetection/blob/develop/static/deploy/python/infer.py

有个重要的问题就是，能否用release/2.0中静态图的代码来使用 tensorRT int8？？

静态图代码可以使用 TensorRT int8

您好，之前按照你的建议，使用了deploy/python/infer.py 不过使用的是static下的。运行输出：

(test) coded@coded-desktop:~/PaddleDetection/static$ python deploy/python/infer.py --model_dir=../../PaddleDetection_old/bestModel/ssdlite_mobilenet_v3_large_fpn/ --image_file=1478896904942573873.jpg  --use_gpu=True --threshold=0.2 --run_mode=trt_fp16
WARNING: AVX is not support on your machine. Hence, no_avx core will be imported, It has much worse preformance than avx core.
/home/coded/.local/virtualenvs/test/lib/python3.6/site-packages/paddle/utils/cpp_extension/extension_utils.py:461: UserWarning: Not found CUDA runtime, please use `export CUDA_HOME= XXX` to specific it.
  "Not found CUDA runtime, please use `export CUDA_HOME= XXX` to specific it."
-----------  Running Arguments -----------
camera_id: -1
image_file: 1478896904942573873.jpg
model_dir: ../../PaddleDetection_old/bestModel/ssdlite_mobilenet_v3_large_fpn/
output_dir: output
run_benchmark: False
run_mode: trt_fp16
threshold: 0.2
use_gpu: True
video_file: 
------------------------------------------
-----------  Model Configuration -----------
Model Arch: SSD
Use Paddle Executor: False
Transform Order: 
--transform op: Resize
--transform op: Normalize
--transform op: Permute
--------------------------------------------
W0513 20:54:04.807541 18561 analysis_predictor.cc:1145] Deprecated. Please use CreatePredictor instead.
Inference: 57.614803314208984 ms per batch image
class_id:1, confidence:0.4840,left_top:[1101.23,423.78], right_bottom:[1142.89,499.94]
class_id:4, confidence:0.6002,left_top:[1218.22,574.18], right_bottom:[1377.13,638.06]
class_id:4, confidence:0.5065,left_top:[359.96,611.97], right_bottom:[460.14,656.98]
class_id:4, confidence:0.3243,left_top:[650.40,604.32], right_bottom:[697.91,646.41]
class_id:4, confidence:0.2525,left_top:[746.66,601.77], right_bottom:[787.74,639.54]
save result to: output/1478896904942573873.jpg

上面是trt_fp16，但模型是经过感知量化之后的。

下面是trt_int8的输出：

(test) coded@coded-desktop:~/PaddleDetection/static$ python deploy/python/infer.py --model_dir=../../PaddleDetection_old/bestModel/ssdlite_mobilenet_v3_large_fpn/ --image_file=1478896904942573873.jpg  --use_gpu=True --threshold=0.2 --run_mode=trt_int8
WARNING: AVX is not support on your machine. Hence, no_avx core will be imported, It has much worse preformance than avx core.
/home/coded/.local/virtualenvs/test/lib/python3.6/site-packages/paddle/utils/cpp_extension/extension_utils.py:461: UserWarning: Not found CUDA runtime, please use `export CUDA_HOME= XXX` to specific it.
  "Not found CUDA runtime, please use `export CUDA_HOME= XXX` to specific it."
-----------  Running Arguments -----------
camera_id: -1
image_file: 1478896904942573873.jpg
model_dir: ../../PaddleDetection_old/bestModel/ssdlite_mobilenet_v3_large_fpn/
output_dir: output
run_benchmark: False
run_mode: trt_int8
threshold: 0.2
use_gpu: True
video_file: 
------------------------------------------
-----------  Model Configuration -----------
Model Arch: SSD
Use Paddle Executor: False
Transform Order: 
--transform op: Resize
--transform op: Normalize
--transform op: Permute
--------------------------------------------
W0513 19:59:15.472364 18171 analysis_predictor.cc:1145] Deprecated. Please use CreatePredictor instead.
Inference: 33687.761545181274 ms per batch image
class_id:1, confidence:0.4905,left_top:[1101.20,423.82], right_bottom:[1142.83,499.95]
class_id:4, confidence:0.6004,left_top:[1218.18,574.16], right_bottom:[1377.05,638.02]
class_id:4, confidence:0.5076,left_top:[359.73,611.98], right_bottom:[459.73,656.93]
class_id:4, confidence:0.3248,left_top:[650.41,604.29], right_bottom:[697.92,646.41]
class_id:4, confidence:0.2520,left_top:[746.22,601.75], right_bottom:[787.28,639.51]
save result to: output/1478896904942573873.jpg

trt_fp16的推理时间为57.6ms，而trt_int8的推理时间为30000多ms，这实在不合理啊。问题：

两者差别太大，并且我的模型是经过感知量化之后的，为什么？
在对模型训练完之后，使用tools/eval.py进行评估的时候，fps在30多，但这里推理的时间是不是太长了，不管是哪种方式，是不是计算方式不一致？环境：jetson xavier nx，测试模型：ssdlite-mobilenetv3_large_fpn，数据集：自定义数据集

thorory commented 3 years ago

同问：我在jetson xavier nx上面使用darknet53模型，量化感知后使用trt_int8帧率比量化感知前使用trt_fp16慢一倍左右，配置使用的是config/slim里面的默认配置

paddle-bot-old[bot] commented 2 years ago

Since this issue has not been updated for more than three months, it will be closed, if it is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于该问题超过三个月未更新，将会被关闭，若问题未解决或有后续问题，请随时重新打开（建议先拉取最新代码进行尝试），我们会继续跟进。

PaddlePaddle / PaddleDetection

关于量化，量化之后测试出来的帧率反而更低了？？ #2924