countryhotel commented 6 months ago

按照官网： https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/applications/%E8%BD%BB%E9%87%8F%E7%BA%A7%E8%BD%A6%E7%89%8C%E8%AF%86%E5%88%AB.md》所述步骤对ch_PP-OCRv4_det_train模型进行车牌数据集fine-tune。

目前官方文档提供的demo是ch_PP-OCRv3_det_distill_train模型。

我们打算选用ch_PP-OCRv4_det_train，在选用配置文件“ch_PP-OCRv4_det_student.yml”时，训练过程中会出现以下报警： [2024/02/21 10:08:23] ppocr WARNING: The pretrained params backbone.conv1.hardswish.scale not in model [2024/02/21 10:08:23] ppocr WARNING: The pretrained params backbone.conv1.hardswish.bias not in model

请问需要怎么调整？或者选用哪个配置文件才合适？（PaddleOCR-release-2.7内config文件夹自带的ch_PP-OCRv4_det_teacher.yml与ch_PP-OCRv4_det_cml.yml都尝试过，报错问题更多）

ymy1005 commented 6 months ago

我也遇到了同样的问题，预训练模型使用ch_PP-OCRv4_det_train，选用ch_PP-OCRv4_det_student.yml出现参数加载不全，使用cml配置改动后不报错，但是这种蒸馏训练我不了解，不知道teacher模型和student模型如何选择

ymy1005 commented 6 months ago

ppocr-v4_det没有student模型吗，怎么使用student配置用普通方式微调

countryhotel commented 6 months ago

我补充下相关环境信息：

系统环境/System Environment： windows 11企业版
版本号/Version：PaddleOCR-release-2.7，ch_PP-OCRv4_det_train，paddlepaddle 2.4.2，cuda版本为11.7
运行指令/Command Code：python tools/train.py -c user_config/ch_PP-OCRv4_det_student.yml
完整报错/Complete Error Message：
Global: debug: false use_gpu: true epoch_num: &epoch_num 500 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/ch_PP-OCRv4_det save_epoch_step: 10 eval_batch_step:
- 0
- 30 cal_metric_during_train: false checkpoints: pretrained_model: user_model/ch_PP-OCRv4_det_train/best_accuracy.pdparams save_inference_dir: null use_visualdl: false infer_img: doc/imgs_en/img_10.jpg save_res_path: ./checkpoints/det_db/predicts_db.txt distributed: true

Architecture: model_type: det algorithm: DB Transform: null Backbone: name: PPLCNetV3 scale: 0.75 det: True Neck: name: RSEFPN out_channels: 96 shortcut: True Head: name: DBHead k: 50

Loss: name: DBLoss balance_loss: true main_loss_type: DiceLoss alpha: 5 beta: 10 ohem_ratio: 3

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Const learning_rate: 0.0005 #(8*8c) warmup_epoch: 0 regularizer: name: L2 factor: 5.0e-05

PostProcess: name: DBPostProcess thresh: 0.3 box_thresh: 0.6 max_candidates: 1000 unclip_ratio: 1.5

Metric: name: DetMetric main_indicator: hmean

Train: dataset: name: SimpleDataSet data_dir: ./CCPD2020/ccpd_green label_file_list:

./CCPD2020/PPOCR/train/det.txt ratio_list: [1.0] transforms:
- DecodeImage: img_mode: BGR channel_first: false
- DetLabelEncode: null
- CopyPaste: null
- IaaAugment: augmenter_args:
  - type: Fliplr args: p: 0.5
  - type: Affine args: rotate:
    - -10
    - 10
  - type: Resize args: size:
    - 0.5
    - 3
- EastRandomCropData: size:
  - 640
  - 640 max_tries: 50 keep_ratio: true
- MakeBorderMap: shrink_ratio: 0.4 thresh_min: 0.3 thresh_max: 0.7 total_epoch: *epoch_num
- MakeShrinkMap: shrink_ratio: 0.4 min_text_size: 8 total_epoch: *epoch_num
- NormalizeImage: scale: 1./255. mean:
  - 0.485
  - 0.456
  - 0.406 std:
  - 0.229
  - 0.224
  - 0.225 order: hwc
- ToCHWImage: null
- KeepKeys: keep_keys:
  - image
  - threshold_map
  - threshold_mask
  - shrink_map
  - shrink_mask loader: shuffle: true drop_last: false batch_size_per_card: 8 num_workers: 8

Eval: dataset: name: SimpleDataSet data_dir: ./CCPD2020/ccpd_green label_file_list:

./CCPD2020/PPOCR/test/det.txt transforms:
- DecodeImage: img_mode: BGR channel_first: false
- DetLabelEncode: null
- DetResizeForTest:
- NormalizeImage: scale: 1./255. mean:
  - 0.485
  - 0.456
  - 0.406 std:
  - 0.229
  - 0.224
  - 0.225 order: hwc
- ToCHWImage: null
- KeepKeys: keep_keys:
  - image
  - shape
  - polys
  - ignore_tags loader: shuffle: false drop_last: false batch_size_per_card: 1 num_workers: 2 profiler_options: null

countryhotel commented 6 months ago

ppocr-v4_det没有student模型吗，怎么使用student配置用普通方式微调

我是直接在官网下载的ch_PP-OCRv4_det_train，解压缩后只有一个best_accuracy.pdparams。

而且用以下代码尝试提取student模型，结果为空。

import paddle

加载预训练模型

all_params = paddle.load("models/ch_PP-OCRv3_rec_train/best_accuracy.pdparams")

查看权重参数的keys

print(all_params.keys())

学生模型的权重提取

s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}

查看学生模型权重参数的keys

print(s_params.keys())

保存

paddle.save(s_params, "models/ch_PP-OCRv3_rec_train/student.pdparams")

countryhotel commented 6 months ago

我也遇到了同样的问题，预训练模型使用ch_PP-OCRv4_det_train，选用ch_PP-OCRv4_det_student.yml出现参数加载不全，使用cml配置改动后不报错，但是这种蒸馏训练我不了解，不知道teacher模型和student模型如何选择

官网也没有找到关于 yml配置文件与模型的对应选取的指导说明。

ymy1005 commented 6 months ago

而且用以下代码尝试提取student

在issue里，有人修复了ppocr-v4_det使用cml报错的问题，能修复。在两个student里添加 det: true

countryhotel commented 6 months ago

而且用以下代码尝试提取student

在issue里，有人修复了ppocr-v4_det使用cml报错的问题，能修复。在两个student里添加 det: true

具体是哪一个issue还记得吗？

ymy1005 commented 6 months ago

而且用以下代码尝试提取student

在issue里，有人修复了ppocr-v4_det使用cml报错的问题，能修复。在两个student里添加 det: true

具体是哪一个issue还记得吗？

搜索 det cml，能看到就在前几个问题里

dencentding commented 6 months ago

报错是修复了，但是 ppocr WARNING: The pretrained params backbone.conv1.hardswish.scale not in model， ppocr WARNING: The pretrained params backbone.conv1.hardswish.bias not in model 这种警告还有，然后训练出来的模型hmean为0或很低

ariefwijaya commented 3 months ago

Set unclip_ratio to 0.5 will solved the isse

ymy1005 commented 3 months ago

Set unclip_ratio to 0.5 will solved the isse

unclip_ratio这个参数是设置检测框大小的，我现在是调大了，太小会出现框太小检测不准的问题把、吧

xuhangaddadd380 commented 3 months ago

Set unclip_ratio to 0.5 will solved the isse

unclip_ratio这个参数是设置检测框大小的，我现在是调大了，太小会出现框太小检测不准的问题把、吧

大佬你这个问题解决了吗？我也遇到这个问题了 The pretrained params backbone.conv1.hardswish.scale not in model The pretrained params backbone.conv1.hardswish.bias not in model 我用的是ppocr-v4_det，就这两条警告

PaddlePaddle / PaddleOCR

ch_PP-OCRv4_det_train 检测模型训练权重系数警告问题 #11607

加载预训练模型

查看权重参数的keys

学生模型的权重提取

查看学生模型权重参数的keys

保存