不同的接口速度不一致的问题

CVdandelion commented 4 years ago

我用slim/train.py文件量化训练了一个模型，模型各层的参数是分开保存的，我想测试以下模型的检测速度用命令： python -u tools/eval.py -c configs/yolov3_darknet_voc.yml 出来的检测速度是24fps 但我用命令 python slim/quantization/eval.py --not_quant_pattern yolo_output -c ./configs/yolov3_darknet_voc.yml 出来的速度是19fps 请问以下速度不一致的原因是什么，应该以哪个为准

python -u tools/eval.py -c configs/yolov3_darknet_voc.yml 2020-10-23 18:49:41,038-INFO: places would be ommited when DataLoader is not iterable W1023 18:49:41.058459 11913 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 10.2, Runtime API Version: 10.0 W1023 18:49:41.060336 11913 device_context.cc:260] device: 0, cuDNN Version: 7.6. 2020-10-23 18:49:42,123-WARNING: output/yolov3_darknet_voc/model_final.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] /home/w/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/io.py:1998: UserWarning: This list is not set, Because of Paramerter not found in program. There are: elementwise_add_19.tmp_0.scale leaky_relu_9.tmp_0.scale elementwise_add_6.tmp_0.scale leaky_relu_55.tmp_0.scale leaky_relu_43.tmp_0.scale elementwise_add_14.tmp_0.scale leaky_relu_48.tmp_0.scale leaky_relu_46.tmp_0.scale leaky_relu_61.tmp_0.scale elementwise_add_16.tmp_0.scale leaky_relu_50.tmp_0.scale leaky_relu_35.tmp_0.scale leaky_relu_68.tmp_0.scale leaky_relu_29.tmp_0.scale leaky_relu_31.tmp_0.scale leaky_relu_1.tmp_0.scale leaky_relu_2.tmp_0.scale elementwise_add_2.tmp_0.scale elementwise_add_22.tmp_0.scale elementwise_add_20.tmp_0.scale concat_1.tmp_0.scale leaky_relu_18.tmp_0.scale @LR_DECAY_COUNTER@ leaky_relu_60.tmp_0.scale elementwise_add_0.tmp_0.scale leaky_relu_70.tmp_0.scale leaky_relu_67.tmp_0.scale elementwise_add_11.tmp_0.scale elementwise_add_10.tmp_0.scale leaky_relu_27.tmp_0.scale concat_0.tmp_0.scale elementwise_add_7.tmp_0.scale leaky_relu_56.tmp_0.scale elementwise_add_21.tmp_0.scale leaky_relu_37.tmp_0.scale elementwise_add_18.tmp_0.scale elementwise_add_9.tmp_0.scale leaky_relu_22.tmp_0.scale elementwise_add_1.tmp_0.scale elementwise_add_13.tmp_0.scale leaky_relu_10.tmp_0.scale leaky_relu_16.tmp_0.scale leaky_relu_41.tmp_0.scale leaky_relu_62.tmp_0.scale leaky_relu_12.tmp_0.scale elementwise_add_4.tmp_0.scale leaky_relu_54.tmp_0.scale elementwise_add_17.tmp_0.scale leaky_relu_26.tmp_0.scale leaky_relu_69.tmp_0.scale leaky_relu_44.tmp_0.scale leaky_relu_39.tmp_0.scale leaky_relu_4.tmp_0.scale leaky_relu_52.tmp_0.scale leaky_relu_14.tmp_0.scale leaky_relu_7.tmp_0.scale leaky_relu_53.tmp_0.scale elementwise_add_5.tmp_0.scale leaky_relu_5.tmp_0.scale image.scale leaky_relu_63.tmp_0.scale leaky_relu_0.tmp_0.scale leaky_relu_24.tmp_0.scale leaky_relu_20.tmp_0.scale elementwise_add_15.tmp_0.scale elementwise_add_8.tmp_0.scale elementwise_add_12.tmp_0.scale leaky_relu_59.tmp_0.scale elementwise_add_3.tmp_0.scale leaky_relu_66.tmp_0.scale leaky_relu_33.tmp_0.scale format(" ".join(unused_para_list))) 2020-10-23 18:49:43,081-INFO: Test iter 0 2020-10-23 18:50:03,113-INFO: Test finish iter 63 2020-10-23 18:50:03,113-INFO: Total number of images: 499, inference time: 24.100337363180035 fps. 2020-10-23 18:50:03,113-INFO: Start evaluate... 2020-10-23 18:50:06,909-INFO: Accumulating evaluatation results... 2020-10-23 18:50:06,940-INFO: mAP(0.50, 11point) = 36.35

python slim/quantization/eval.py --not_quant_pattern yolo_output -c ./configs/yolov3_darknet_voc.yml 020-10-23 18:45:55,976-INFO: places would be ommited when DataLoader is not iterable 2020-10-23 18:45:55,977-INFO: quant_aware config {'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', 'weight_bits': 8, 'activation_bits': 8, 'not_quant_pattern': ['yolo_output'], 'quantize_op_types': ['depthwise_conv2d', 'mul', 'conv2d'], 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, 'for_tensorrt': False, 'is_full_quantize': False} W1023 18:45:58.634968 11767 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 10.2, Runtime API Version: 10.0 W1023 18:45:58.637008 11767 device_context.cc:260] device: 0, cuDNN Version: 7.6. 2020-10-23 18:45:59,347-WARNING: output/yolov3_darknet_voc/model_final.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] /home/w/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/io.py:1998: UserWarning: This list is not set, Because of Paramerter not found in program. There are: @LR_DECAY_COUNTER@ format(" ".join(unused_para_list))) 2020-10-23 18:45:59,641-INFO: convert config {'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max', 'weight_bits': 8, 'activation_bits': 8, 'not_quant_pattern': ['yolo_output'], 'quantize_op_types': ['depthwise_conv2d', 'mul', 'conv2d'], 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9, 'for_tensorrt': False, 'is_full_quantize': False} 2020-10-23 18:46:02,941-INFO: Test iter 0 2020-10-23 18:46:28,044-INFO: Test finish iter 63 2020-10-23 18:46:28,044-INFO: Total number of images: 499, inference time: 19.26849652404211 fps. 2020-10-23 18:46:28,044-INFO: Start evaluate... 2020-10-23 18:46:31,817-INFO: Accumulating evaluatation results... 2020-10-23 18:46:31,847-INFO: mAP(0.50, 11point) = 36.82

qingqing01 commented 4 years ago

@CVdandelion

训练的量化模型，需要使用slim/quantization/eval.py做评估。这个脚本会给网络图graph中增加量化相关的op，FPS是会变慢的。
量化的INT8加速，如果是移动端，需要使用PaddleLite的int8加速。如果是服务器端或Jetson系列的设备，需要使用PaddleInference的TensorRT做预测。

heavengate commented 4 years ago

您好，你这边测试模型检测速度目的是做什么呢，eval.py里的FPS是会受到框架warm up，数据读取等影响，如果是像测试部署上线的速度的话，通过export_mode.py然后用deploy/python/infer.py测试相对准确一些，可以参考 https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/configs/ppyolo/README_cn.md#4-推理部署与benchmark

qingqing01 commented 4 years ago

@CVdandelion 请问是否还有问题？如果没有问题的话，辛苦关闭下issue

CVdandelion commented 4 years ago

@qingqing01 好的，还有问题就是，eval.py的速度是35fps，而用deploy/python/infer.py的速度只有22，正常吗？按前面讲的eval.py里的FPS是会受到框架warm up，数据读取等影响，速度不应该更慢一点吗？

qingqing01 commented 3 years ago

@CVdandelion eval.py测试全量的数据，数据大小不一致，预处理耗时也不统一，reader里还有多线程/进程之类的加速。deploy/python/infer.py测试的是固定图片。因为图片大小、预处理等会有点差异。

PaddlePaddle / PaddleDetection

不同的接口速度不一致的问题 #1597