PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
42.78k stars 7.69k forks source link

PP-OCRv3 det文字检测 教师模型dml训练。准确率一直在0.6左右。如何提高?帮忙看看 #9053

Closed rexzhengzhihong closed 1 year ago

rexzhengzhihong commented 1 year ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

Architecture: name: DistillationModel algorithm: Distillation model_type: det Models: Student: return_all_feats: false model_type: det algorithm: DB Backbone: name: ResNet_vd in_channels: 3 layers: 50 Neck: name: LKPAN out_channels: 256 Head: name: DBHead kernel_list: [7,2,2] k: 50 Student2: return_all_feats: false model_type: det algorithm: DB Backbone: name: ResNet_vd in_channels: 3 layers: 50 Neck: name: LKPAN out_channels: 256 Head: name: DBHead kernel_list: [7,2,2] k: 50

Loss: name: CombinedLoss loss_config_list:

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.001 warmup_epoch: 2 regularizer: name: 'L2' factor: 0

PostProcess: name: DistillationDBPostProcess model_name: ["Student", "Student2"] key: head_out thresh: 0.3 box_thresh: 0.6 max_candidates: 1000 unclip_ratio: 1.5

Metric: name: DistillationMetric base_metric_name: DetMetric main_indicator: hmean key: "Student"

Train: dataset: name: SimpleDataSet data_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det label_file_list:

Eval: dataset: name: SimpleDataSet data_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det label_file_list:


- 标注文件label.txt
类似这种格式大概20张

/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det/train/zzszyfp_9.jpg [{"transcription": "连云港弘果电子有限员任公司", "points": [[138, 80], [407, 80], [407, 94], [138, 94]], "difficult": false, "key_cls": "购买方名称"}, {"transcription": "9132070605183856951", "points": [[137, 95], [407, 95], [407, 107], [137, 107]], "difficult": false, "key_cls": "购买方纳税人识别号"}, {"transcription": "江苏省连云港市海州区周培街魏书路16号0518-17417186", "points": [[138, 108], [407, 108], [407, 120], [138, 120]], "difficult": false, "key_cls": "购买方地址电话"}, {"transcription": "中国建设银行连云港市海州区支行41123350084496", "points": [[138, 124], [408, 124], [408, 134], [138, 134]], "difficult": false, "key_cls": "购买方开户行及账号"}, {"transcription": "(详见销货清单)", "points": [[54, 153], [186, 153], [186, 169], [54, 169]], "difficult": false, "key_cls": "项目名称1"}, {"transcription": "无法识别", "points": [[188, 153], [253, 153], [253, 167], [188, 167]], "difficult": false, "key_cls": "规格型号1"}, {"transcription": "无法识别", "points": [[256, 151], [288, 151], [288, 167], [256, 167]], "difficult": false, "key_cls": "单位1"}, {"transcription": "无法识别", "points": [[289, 151], [343, 151], [343, 170], [289, 170]], "difficult": false, "key_cls": "数量1"}, {"transcription": "无法识别", "points": [[344, 150], [418, 150], [418, 169], [344, 169]], "difficult": false, "key_cls": "单价1"}, {"transcription": "1441200.00", "points": [[421, 151], [506, 151], [506, 169], [421, 169]], "difficult": false, "key_cls": "金额1"}, {"transcription": "酒", "points": [[509, 152], [535, 152], [535, 171], [509, 171]], "difficult": false, "key_cls": "税率1"}, {"transcription": "187356.00", "points": [[536, 153], [638, 153], [638, 170], [536, 170]], "difficult": false, "key_cls": "税额1"}, {"transcription": "¥1441200.00", "points": [[421, 235], [505, 235], [505, 253], [421, 253]], "difficult": false, "key_cls": "合计金额"}, {"transcription": "0095E81夫", "points": [[535, 235], [637, 235], [637, 251], [535, 251]], "difficult": false, "key_cls": "合计税额"}, {"transcription": "壹佰陆洽贰万仟伍伍拾陆元整", "points": [[204, 255], [435, 255], [435, 271], [204, 271]], "difficult": false, "key_cls": "价税合计大写"}, {"transcription": "1628556.00", "points": [[529, 254], [635, 254], [635, 273], [529, 273]], "difficult": false, "key_cls": "价税合计小写"}, {"transcription": "谷满堂 ", "points": [[354, 328], [416, 328], [416, 343], [354, 343]], "difficult": false, "key_cls": "开票人"}, {"transcription": "2022年04月11日", "points": [[550, 56], [641, 56], [641, 74], [550, 74]], "difficult": false, "key_cls": "开票日期"}, {"transcription": "39882604", "points": [[491, 12], [592, 12], [592, 43], [491, 43]], "difficult": false, "key_cls": "发票号"}, {"transcription": "3207181140", "points": [[111, 19], [215, 19], [215, 39], [111, 39]], "difficult": false, "key_cls": "other"}]



- 是用paddleocrLable 标注的。如下图

![image](https://user-images.githubusercontent.com/32863094/218388700-798c425b-258e-4afc-bd21-4d58505ee2ac.png)
rexzhengzhihong commented 1 year ago

[Uploading train.txt…]()

LDOUBLEV commented 1 year ago

类似这种格式大概20张

数据太少了,你用的模型又很大ResNet50;

你的标注也有点问题,没有文字的部分不用标注,检测框只标记文字区域就可以了

image

如果你的场景是固定的电子发票,建议先找出你需要识别的单元格,再用识别模型去识别单元格里的内容就可以了

rexzhengzhihong commented 1 year ago

好的。谢谢。 1.ResNet50预训练模型太大。哪一个模型比较适合?比如MobileNetV3_large_x0_5_pretrained? 2.标注的问题我改一下。 3.是固定场景的。但是图片截图不一样,单元格位置有点区别。怎么找到需要识别的单元格??

rexzhengzhihong commented 1 year ago

@LDOUBLEV

LDOUBLEV commented 1 year ago

1.ResNet50预训练模型太大。哪一个模型比较适合?比如MobileNetV3_large_x0_5_pretrained?

配置文件中把模型结构也换成MobilenetV3的

3.是固定场景的。但是图片截图不一样,单元格位置有点区别。怎么找到需要识别的单元格??

可以在网上搜一下表格线检测,Opencv就能实现;然后定位到要识别文字的单元格

rexzhengzhihong commented 1 year ago

1.ResNet50预训练模型太大。哪一个模型比较适合?比如MobileNetV3_large_x0_5_pretrained?

配置文件中把模型结构也换成MobilenetV3的

3.是固定场景的。但是图片截图不一样,单元格位置有点区别。怎么找到需要识别的单元格??

可以在网上搜一下表格线检测,Opencv就能实现;然后定位到要识别文字的单元格 有的空 。有时候有值,有时候没有值(如下入)。该空需要标注吗?我重新标注后hmean增加了。但是预测的结果有的空还是不出来 image

LDOUBLEV commented 1 year ago

没有内容就不用标注,文字检测只检测有文字的区域

rexzhengzhihong commented 1 year ago

换了模型。准确率还是没啥变化 image

rexzhengzhihong commented 1 year ago

重新标注后,。还是用原来的esNet50预训练模型。评估的时候准确率提高了。但是预测的时候。结果还是差距比较大

rexzhengzhihong commented 1 year ago

我是 1、采用DML蒸馏方法训练检测教师模型。 2、基于DML蒸馏方法的finetune训练 3、 将训练后的模型转成finetune

LDOUBLEV commented 1 year ago

现在主要问题是数据太少了,想办法扩充数据吧

rexzhengzhihong commented 1 year ago

现在主要问题是数据太少了,想办法扩充数据吧

好的。数据量我问题我再多设置一些。 还有个问题。我做了“基于DML蒸馏方法的finetune训练”后。得到训练模型。通过训练模型去预测,效果还是可以的。但是只要我将模型导出成inference模型。通过inference去预测。效果就会差很多,我导出的命令是

python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o \
Global.pretrained_model="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" \
Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/"

用普通模型预测的命令是:

python3 tools/infer_det.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o \
Global.infer_img="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/test/det_imgs/" \
Global.pretrained_model="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" \
Global.save_res_path="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/test/det_imgsresult/predicts_db.txt"

用inference模型预测的命令是:

python3 tools/infer/predict_det.py --det_algorithm="DB" \
--det_model_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/Student" \
--image_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/test/det_imgs/" \
--draw_img_save_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/test/det_imgsresult/" \
--use_gpu=True

这可能是什么原因呢???

LDOUBLEV commented 1 year ago

python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o \ Global.pretrained_model="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" \ Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/"

这个命令换成 python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o \ Global.checkpoints="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" \ Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/"

这样试试

可能是pretrained model没有加载上

rexzhengzhihong commented 1 year ago

python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o Global.pretrained_model="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/"

这个命令换成 python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o Global.checkpoints="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/"

这样试试

可能是pretrained model没有加载上

改了一样,没效果

rexzhengzhihong commented 1 year ago

文件ch_PP-OCRv3_det_dml_zzszyfp.yml

Global:
  use_gpu: true
  #epoch_num: 1200
  epoch_num: 100
  log_smooth_window: 20
  print_batch_step: 2
  save_model_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/output/ch_db_mv3/
  #save_epoch_step: 1200
  save_epoch_step: 1200
  # evaluation is run every 5000 iterations after the 4000th iteration
  # eval_batch_step: [3000, 2000]
  eval_batch_step: [0, 20]
  cal_metric_during_train: False
  pretrained_model: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
  checkpoints:
  save_inference_dir:
  use_visualdl: False
  infer_img: doc/imgs_en/img_10.jpg
  save_res_path: ./output/det_db/predicts_db.txt

Architecture:
  name: DistillationModel
  algorithm: Distillation
  model_type: det
  Models:
    Student:
      return_all_feats: false
      model_type: det
      algorithm: DB
      Backbone:
        name: ResNet_vd
        in_channels: 3
        layers: 50
      Neck:
        name: LKPAN
        out_channels: 256
      Head:
        name: DBHead
        kernel_list: [7,2,2]
        k: 50
    Student2:
      return_all_feats: false
      model_type: det
      algorithm: DB
      Backbone:
        name: ResNet_vd
        in_channels: 3
        layers: 50
      Neck:
        name: LKPAN
        out_channels: 256
      Head:
        name: DBHead
        kernel_list: [7,2,2]
        k: 50

Loss:
  name: CombinedLoss
  loss_config_list:
  - DistillationDMLLoss:
      model_name_pairs:
      - ["Student", "Student2"]
      maps_name: "thrink_maps"
      weight: 1.0
      # act: None
      model_name_pairs: ["Student", "Student2"]
      key: maps
  - DistillationDBLoss:
      weight: 1.0
      model_name_list: ["Student", "Student2"]
      # key: maps
      name: DBLoss
      balance_loss: true
      main_loss_type: DiceLoss
      alpha: 5
      beta: 10
      ohem_ratio: 3

Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.001
    warmup_epoch: 2
  regularizer:
    name: 'L2'
    factor: 0

PostProcess:
  name: DistillationDBPostProcess
  model_name: ["Student", "Student2"]
  key: head_out
  thresh: 0.3
  box_thresh: 0.6
  max_candidates: 1000
  unclip_ratio: 1.5

Metric:
  name: DistillationMetric
  base_metric_name: DetMetric
  main_indicator: hmean
  key: "Student"

Train:
  dataset:
    name: SimpleDataSet
    data_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det
    label_file_list:
      - /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det/train.txt
    ratio_list: [1.0]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - DetLabelEncode: # Class handling label
      - CopyPaste:
      - IaaAugment:
          augmenter_args:
            - { 'type': Fliplr, 'args': { 'p': 0.5 } }
            - { 'type': Affine, 'args': { 'rotate': [-10, 10] } }
            - { 'type': Resize, 'args': { 'size': [0.5, 3] } }
      - EastRandomCropData:
          size: [960, 960]
          max_tries: 50
          keep_ratio: true
      - MakeBorderMap:
          shrink_ratio: 0.4
          thresh_min: 0.3
          thresh_max: 0.7
      - MakeShrinkMap:
          shrink_ratio: 0.4
          min_text_size: 8
      - NormalizeImage:
          scale: 1./255.
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: 'hwc'
      - ToCHWImage:
      - KeepKeys:
          keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list
  loader:
    shuffle: True
    drop_last: False
    #batch_size_per_card: 8
    batch_size_per_card: 2
    num_workers: 4

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det
    label_file_list:
      - /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det/val.txt
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - DetLabelEncode: # Class handling label
      - DetResizeForTest:
#           image_shape: [736, 1280]
      - NormalizeImage:
          scale: 1./255.
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: 'hwc'
      - ToCHWImage:
      - KeepKeys:
          keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 1 # must be 1
    num_workers: 2
rexzhengzhihong commented 1 year ago

现在主要问题是数据太少了,想办法扩充数据吧

确实。数据加到50张。准确率高了不少。来到0.8+了

rexzhengzhihong commented 1 year ago

python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o Global.pretrained_model="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/" 这个命令换成 python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o Global.checkpoints="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/" 这样试试 可能是pretrained model没有加载上

改了一样,没效果。还会是什么原因,能帮忙给看看吗

@LDOUBLEV

rexzhengzhihong commented 1 year ago

可以了

bianliuyang commented 1 year ago

您好,您再使用dml训练的时候,会有这种提示吗:paddle WARNING:ppocr warning: the pretrained params backbone.* not in model ,我跟您一样的config配置文件,一样的预训练模型,不知道为啥我这边会出现这种情况。。

papersuper commented 1 year ago

python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o Global.pretrained_model="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/" 这个命令换成 python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o Global.checkpoints="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/" 这样试试 可能是pretrained model没有加载上

改了一样,没效果

请问如何解决的呢

keyfall commented 10 months ago

哥们,数据集还有么,我也有这种项目,能发我一份么,谢谢 可以的话留个言,邮箱是2536726426@qq.com