YOLOv3_r34量化模型导出后失去效果

song121982 commented 2 years ago

环境：

win11 paddlepaddle-gpu 2.2.2.post112 paddledet 2.1.0 paddleslim 2.1.0

配置文件

以configs/yolov3/yolov3_r34_270e_coco.yml为基础，如下：

weights: output/yolov3_r34_270e_coco/model_final

# dataset
metric: VOC
map_type: 11point
num_classes: 1

TrainDataset:
  !VOCDataSet
    dataset_dir: dataset/VSvehicle_vehicle6000
    anno_path: train.txt
    label_list: label_list.txt
    data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']

EvalDataset:
  !VOCDataSet
    dataset_dir: dataset/VSvehicle_vehicle6000
    anno_path: valid.txt
    label_list: label_list.txt
    data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']

TestDataset:
  !ImageFolder
    anno_path: dataset/VSvehicle_vehicle6000/label_list.txt

# runtime
use_gpu: true
log_iter: 20
save_dir: output
snapshot_epoch: 2

# optimizer
epoch: 200

LearningRate:
  base_lr: 0.0002
  schedulers:
  - !PiecewiseDecay
    gamma: 0.1
    milestones:
    - 140
    - 180
  - !LinearWarmup
    start_factor: 0.
    steps: 4000

OptimizerBuilder:
  optimizer:
    momentum: 0.9
    type: Momentum
  regularizer:
    factor: 0.0005
    type: L2

# r34
architecture: YOLOv3
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet34_pretrained.pdparams
norm_type: sync_bn

YOLOv3:
  backbone: ResNet
  neck: YOLOv3FPN
  yolo_head: YOLOv3Head
  post_process: BBoxPostProcess

ResNet:
  depth: 34
  return_idx: [1, 2, 3]
  freeze_at: -1
  freeze_norm: false
  norm_decay: 0.

YOLOv3Head:
  anchors: [[10, 13], [16, 30], [33, 23],
            [30, 61], [62, 45], [59, 119],
            [116, 90], [156, 198], [373, 326]]
  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
  loss: YOLOv3Loss

YOLOv3Loss:
  ignore_thresh: 0.7
  downsample: [32, 16, 8]
  label_smooth: false

BBoxPostProcess:
  decode:
    name: YOLOBox
    conf_thresh: 0.005
    downsample_ratio: 32
    clip_bbox: true
  nms:
    name: MultiClassNMS
    keep_top_k: 100
    score_threshold: 0.01
    nms_threshold: 0.45
    nms_top_k: 1000

# reader
worker_num: 2
TrainReader:
  inputs_def:
    num_max_boxes: 50
  sample_transforms:
    - Decode: {}
    - Mixup: {alpha: 1.5, beta: 1.5}
    - RandomDistort: {}
    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
    - RandomCrop: {}
    - RandomFlip: {}
  batch_transforms:
    - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
    - NormalizeBox: {}
    - PadBox: {num_max_boxes: 50}
    - BboxXYXY2XYWH: {}
    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
    - Permute: {}
    - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
  batch_size: 2
  shuffle: true
  drop_last: true
  mixup_epoch: 250
  use_shared_memory: true

EvalReader:
  inputs_def:
    num_max_boxes: 50
  sample_transforms:
    - Decode: {}
    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
    - Permute: {}
  batch_size: 1

TestReader:
  inputs_def:
    image_shape: [3, 608, 608]
  sample_transforms:
    - Decode: {}
    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
    - Permute: {}
  batch_size: 1

slim文件：

slim: QAT

QAT:
  quant_config: {
    'activation_preprocess_type': 'PACT',
    'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max',
    'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9,
    'quantizable_layer_type': ['Conv2D', 'Linear']}
  print_model: True

事故描述：

train结束得到的非部署模型可以正常得到预测结果，但是经过export_model之后的导出模型预测没有任何效果，也没报错

导出前预测： python tools/infer.py -c configs/yolov3/r34_vehicle.yml --infer_file=demo/03239.jpg -o weights=output/r34_vehicle6000_qat/model_final.pdparams 输出结果正常，能检测到物体

导出模型之后： python tools/export_model.py -c configs/yolov3/r34_vehicle.yml --slim_config configs/slim/quant/r34_vehicle6000_qat.yml -o weights=output/r34_vehicle6000_qat/model_final.pdparams

使用下面的指令使用导出模型预测，没有任何效果，没有任何预测框生成： python deploy/python/infer.py --model_dir=output_inference/r34_vehicle6000_qat --image_file=demo/03239.jpg --use_gpu=True

预测结果log：

yghstill commented 2 years ago

@song121982 量化模型导出后，在GPU上预测请使用TensorRT，设置run_mode='trt_int8'即可。（需要安装带有TensroRT的Paddle）具体可以查看文档：https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/TENSOR_RT.md

song121982 commented 2 years ago

@yghstill 以前的量化导出模型在不使用‘run_mode='trt_int8'’指令的情况下，在没有tensorRT的设备上也是可以预测的啊，只不过没有加速而已，这次的导出模型不知道为什么一点效果都没有。

song121982 commented 2 years ago

@yghstill 我又在https://www.paddlepaddle.org.cn/inference/v2.2/user_guides/download_lib.html#windows重新下载名为cuda11.2_cudnn8.1_avx_mkl-trt8.0.1.6的python预测库，也完成了tensorRT下lib目录的环境变量添加。 pip install -r requirements.txt python setup.py install 都执行完成，但在框架测试过程中报错： python ppdet/modeling/tests/test_architectures.py

PS E:\syg\PaddleDetection-release-2.3> python ppdet/modeling/tests/test_architectures.py
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0402 10:33:58.146404 37340 tensorrt.cc:56] You are using Paddle compiled with TensorRT, but TensorRT dynamic library is not found. Ignore this if TensorRT is not needed.
The TensorRT that Paddle depends on is not configured correctly.
  Suggestions:
  1. Check if the TensorRT is installed correctly and its version is matched with paddlepaddle you installed.
  2. Configure environment variables as follows:
  - Linux: set LD_LIBRARY_PATH by `export LD_LIBRARY_PATH=...`
  - Windows: set PATH by `set PATH=XXX;%PATH%`
  - Mac: set  DYLD_LIBRARY_PATH by `export DYLD_LIBRARY_PATH=...`
E0402 10:33:58.146404 37340 port.h:50] Load symbol getPluginRegistry failed.
Error: Can not import avx core while this file exists: D:\anaconda\envs\paddle-trt\lib\site-packages\paddle\fluid\core_avx.pyd
Traceback (most recent call last):
  File "ppdet/modeling/tests/test_architectures.py", line 20, in <module>
    import ppdet
  File "D:\anaconda\envs\paddle-trt\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\__init__.py", line 15, in <module>
    from . import (core, data, engine, modeling, model_zoo, optimizer, metrics,
  File "D:\anaconda\envs\paddle-trt\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\data\__init__.py", line 15, in <module>
    from . import source
  File "D:\anaconda\envs\paddle-trt\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\data\source\__init__.py", line 15, in <module>
    from . import coco
  File "D:\anaconda\envs\paddle-trt\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\data\source\coco.py", line 18, in <module>
    from .dataset import DetDataset
  File "D:\anaconda\envs\paddle-trt\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\data\source\dataset.py", line 22, in <module>
    from paddle.io import Dataset
  File "D:\anaconda\envs\paddle-trt\lib\site-packages\paddle\__init__.py", line 25, in <module>
    from .fluid import monkey_patch_variable
  File "D:\anaconda\envs\paddle-trt\lib\site-packages\paddle\fluid\__init__.py", line 36, in <module>
    from . import framework
  File "D:\anaconda\envs\paddle-trt\lib\site-packages\paddle\fluid\framework.py", line 37, in <module>
    from . import core
  File "D:\anaconda\envs\paddle-trt\lib\site-packages\paddle\fluid\core.py", line 294, in <module>
    raise e
  File "D:\anaconda\envs\paddle-trt\lib\site-packages\paddle\fluid\core.py", line 256, in <module>
    from .core_avx import *
ImportError: DLL load failed while importing core_avx: 动态链接库(DLL)初始化例程失败。

song121982 commented 2 years ago

又在https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-release下载的win-cuda11.2-cudnn8.2-mkl-vs2017-avx的预测库同样卡在了python ppdet/modeling/tests/test_architectures.py 这次是另一种报错：

PS E:\syg\PaddleDetection-release-2.3> python ppdet/modeling/tests/test_architectures.py
D:\anaconda\lib\site-packages\win32\lib\pywintypes.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp, sys, os
Traceback (most recent call last):
  File "ppdet/modeling/tests/test_architectures.py", line 20, in <module>
    import ppdet
  File "D:\anaconda\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\__init__.py", line 15, in <module>
    from . import (core, data, engine, modeling, model_zoo, optimizer, metrics,
  File "D:\anaconda\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\data\__init__.py", line 16, in <module>
    from . import transform
  File "D:\anaconda\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\data\transform\__init__.py", line 15, in <module>
    from . import operators
  File "D:\anaconda\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\data\transform\operators.py", line 44, in <module>
    from ppdet.modeling import bbox_utils
  File "D:\anaconda\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\modeling\__init__.py", line 25, in <module>
    from . import architectures
  File "D:\anaconda\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\modeling\architectures\__init__.py", line 21, in <module>
    from . import deepsort
  File "D:\anaconda\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\modeling\architectures\deepsort.py", line 22, in <module>
    from ppdet.modeling.mot.utils import Detection, get_crops, scale_coords, clip_box
  File "D:\anaconda\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\modeling\mot\__init__.py", line 15, in <module>
    from . import matching
  File "D:\anaconda\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\modeling\mot\matching\__init__.py", line 15, in <module>
    from . import jde_matching
  File "D:\anaconda\lib\site-packages\paddledet-2.3.0-py3.8.egg\ppdet\modeling\mot\matching\jde_matching.py", line 18, in <module>
    import lap
  File "D:\anaconda\lib\site-packages\lap\__init__.py", line 25, in <module>
    from ._lapjv import (
  File "__init__.pxd", line 199, in init lap._lapjv
ValueError: numpy.ndarray has the wrong size, try recompiling. Expected 80, got 88

wangxinit commented 2 years ago

numpy 版本的问题，安装 >=1.22.3

卸载numpy ，pip uninstall numpy -y 重新安装numpy ， pip install numpy

song121982 commented 2 years ago

@wangxinit 谢谢解答，和这个描述的是一个问题https://github.com/PaddlePaddle/PaddleDetection/issues/5626#issue-1195795538 我按照您说的重新安装了numpy，但是

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
paddlepaddle-gpu 2.2.2.post112 requires numpy<=1.19.3,>=1.13; python_version >= "3.5" and platform_system == "Windows", but you have numpy 1.22.3 which is incompatible.

导出模型的检测还是没有效果，我重新训练一下看看吧。

wangxinit commented 2 years ago

@wangxinit 谢谢解答，和这个描述的是一个问题#5626 (comment) 我按照您说的重新安装了numpy，但是
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
paddlepaddle-gpu 2.2.2.post112 requires numpy<=1.19.3,>=1.13; python_version >= "3.5" and platform_system == "Windows", but you have numpy 1.22.3 which is incompatible.
导出模型的检测还是没有效果，我重新训练一下看看吧。

我之前遇到过这个问题“ValueError: numpy.ndarray has the wrong size, try recompiling. Expected 80, got 88” 我装好paddlepaddle-gpu 再装PaddleDetection

最后卸载numpy 1.19 重新安装numpy1.22 也是会报错，但是PaddleDetection运行正常

song121982 commented 2 years ago

@wangxinit 谢谢，这个能用了

FengYi-67 commented 2 years ago

“量化训练后的模型”导出后预测不出来结果这个问题解决了吗，我也遇到这样的情况了。

PaddlePaddle / PaddleDetection

YOLOv3_r34量化模型导出后失去效果 #5558