PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
42.92k stars 7.71k forks source link

PPOCR v4检测模型微调后,效果不如原始的V4检测模型 #12166

Closed TinyQi closed 4 months ago

TinyQi commented 4 months ago

背景

你好,我现在有一个街景文字检测的应用场景,我构建了一个1000张左右的数据集进行微调。但是我现在遇到了微调PPOCR v4检测模型后效果不如原始的检测V4模型的问题。我尝试过增加开源的街景数据集(ICDAR2019-LSVT),以及修改一些关键的参数,比如学习率、预处理的缩放尺寸、后处理中的box_thresh等参数,但是效果都未有明显提示。

我想请问一下,有什么方式可以让微调后的模型精度,在我自己的数据集上,超过开源的V4检测模型呢?数据集、改代码、改参数?

PaddleOCR 版本:2.7 paddlepaddle-gpu版本:2.4.0

另外,为了快速定位问题,我分享一下我的初步分析原因,希望有所帮助。我从另一个issues中得知百度训练这个V4的检测模型的数据量是10W+,而我这个微调数据集只有1000+的规模。并且这个1000+的自有数据集的数据又杂又乱,字样众多,但是却只出现过一次。从训练日志来看,随着训练轮次的增多,模型的精度呈下滑态势,基本上最好的模型就是训练初期保存下来的模型。所以我推测,数据集过于单薄应该是微调效果不佳的原因之一。 下面可以看看这些数据集的文本裁切小图: 文字样式多样 image 文字颜色多样 image

TinyQi commented 4 months ago
这里是我的微调训练的配置文件
Global:
  debug: false
  use_gpu: true
  epoch_num: &epoch_num 500
  log_smooth_window: 20
  print_batch_step: 100
  save_model_dir: xxxxxxxxx//ch_PP-OCRv4
  save_epoch_step: 10
  eval_batch_step:
  - 0
  - 1500
  cal_metric_during_train: false
  checkpoints:
  # pretrained_model: xxxxxxxxx/ch_PP-OCRv4_det_server_train/best_accuracy.pdparams
  pretrained_model: xxxxxxxxx/PPHGNet_small_ocr_det.pdparams
  save_inference_dir: null
  use_visualdl: false
  infer_img: doc/imgs_en/img_10.jpg
  save_res_path: ./checkpoints/det_db/predicts_db.txt
  distributed: true

Architecture:
  model_type: det
  algorithm: DB
  Transform: null
  Backbone:
    name: PPHGNet_small
    det: True
  Neck:
    name: LKPAN
    out_channels: 256
    intracl: true
  Head:
    name: PFHeadLocal
    k: 50
    mode: "large"

Loss:
  name: DBLoss
  balance_loss: true
  main_loss_type: DiceLoss
  alpha: 5
  beta: 10
  ohem_ratio: 3

Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.0001 #(8*8c)
    warmup_epoch: 2
  regularizer:
    name: L2
    factor: 1e-6

PostProcess:
  name: DBPostProcess
  thresh: 0.3
  box_thresh: 0.6  #默认是:0.6
  max_candidates: 1000
  unclip_ratio: 1.5
  # box_type: poly

Metric:
  name: DetMetric
  main_indicator: hmean

Train:
  dataset:
    name: SimpleDataSet
    data_dir: /
    label_file_list:
      - xxxxxxxxx//train.txt
    #   - xxxxxxxxx/ICDAR2019-LSVT_ppocr_format/ready_2_train/train.txt
    # ratio_list: [1,1]
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - DetLabelEncode: null
    - CopyPaste: null
    - IaaAugment:
        augmenter_args:
        - type: Fliplr
          args:
            p: 0.5
        - type: Affine
          args:
            rotate:
            - -10
            - 10
        - type: Resize
          args:
            size:
            - 0.5
            - 3
    - EastRandomCropData:
        size:
        - 960
        - 960
        max_tries: 50
        keep_ratio: true
    - MakeBorderMap:
        shrink_ratio: 0.4
        thresh_min: 0.3
        thresh_max: 0.7
        total_epoch: *epoch_num
    - MakeShrinkMap:
        shrink_ratio: 0.4
        min_text_size: 8
        total_epoch: *epoch_num
    - NormalizeImage:
        scale: 1./255.
        mean:
        - 0.485
        - 0.456
        - 0.406
        std:
        - 0.229
        - 0.224
        - 0.225
        order: hwc
    - ToCHWImage: null
    - KeepKeys:
        keep_keys:
        - image
        - threshold_map
        - threshold_mask
        - shrink_map
        - shrink_mask
  loader:
    shuffle: true
    drop_last: false
    batch_size_per_card: 4
    num_workers: 0

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: /
    label_file_list:
      - xxxxxxxxx/test.txt
    #   - xxxxxxxxx/ready_2_train/val.txt
    # ratio_list: [0.1,1]
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - DetLabelEncode: null
    - DetResizeForTest:
        # limit_side_len: 960
        # limit_type: 'max'
        image_shape: [960,960]
        keep_ratio: false
    - NormalizeImage:
        scale: 1./255.
        mean:
        - 0.485
        - 0.456
        - 0.406
        std:
        - 0.229
        - 0.224
        - 0.225
        order: hwc
    - ToCHWImage: null
    - KeepKeys:
        keep_keys:
        - image
        - shape
        - polys
        - ignore_tags
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 1
    num_workers: 0
profiler_options: null
SWHL commented 4 months ago

你好,你的这种情况一般可以从以下两步尝试缓解: 步骤一:冻结原有模型部分参数,微调后几层 步骤二:自建数据集规模太小,建议构建规模更大数据集来训练。