PP-OCRv3 det文字检测教师模型dml训练。准确率一直在0.6左右。如何提高？帮忙看看

rexzhengzhihong commented 1 year ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

系统环境/System Environment： ubuntu20.04
版本号/Version：Paddle： PaddleOCR：问题相关组件/Related components： paddle 1.0.2 paddle-bfloat 0.1.2 paddle2onnx 0.9.7 paddlefsl 1.1.0 paddlenlp 2.4.1 paddleocr 2.6.1.2 paddlepaddle-gpu 2.4.1.post116 pandas 1.1.5 pandocfilters 1.5.0

运行指令/Command Code：

python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml \
-o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Global.save_model_dir=/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/output/ch_db_mv3/

完整报错/Complete Error Message：评估的结果： hmean:0.6829268292682926 precision:0.6666666666666666 recall:0.7 导致后面的导出的模型预测结果不太理想。是标记数据问题？还是训练方法问题？还是配置问题。能不能指点一下

ch_PP-OCRv3_det_dml_zzszyfp.yml 文件


Global:
use_gpu: true
#epoch_num: 1200
epoch_num: 120
log_smooth_window: 20
print_batch_step: 2
save_model_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/output/ch_db_mv3/
#save_epoch_step: 1200
save_epoch_step: 1200
# evaluation is run every 5000 iterations after the 4000th iteration
# eval_batch_step: [3000, 2000]
eval_batch_step: [0, 20]
cal_metric_during_train: False
pretrained_model: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./output/det_db/predicts_db.txt

Architecture: name: DistillationModel algorithm: Distillation model_type: det Models: Student: return_all_feats: false model_type: det algorithm: DB Backbone: name: ResNet_vd in_channels: 3 layers: 50 Neck: name: LKPAN out_channels: 256 Head: name: DBHead kernel_list: [7,2,2] k: 50 Student2: return_all_feats: false model_type: det algorithm: DB Backbone: name: ResNet_vd in_channels: 3 layers: 50 Neck: name: LKPAN out_channels: 256 Head: name: DBHead kernel_list: [7,2,2] k: 50

Loss: name: CombinedLoss loss_config_list:

DistillationDMLLoss: model_name_pairs:
- ["Student", "Student2"] maps_name: "thrink_maps" weight: 1.0
  act: None
  
  model_name_pairs: ["Student", "Student2"] key: maps
DistillationDBLoss: weight: 1.0 model_name_list: ["Student", "Student2"]
key: maps

name: DBLoss balance_loss: true main_loss_type: DiceLoss alpha: 5 beta: 10 ohem_ratio: 3

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.001 warmup_epoch: 2 regularizer: name: 'L2' factor: 0

PostProcess: name: DistillationDBPostProcess model_name: ["Student", "Student2"] key: head_out thresh: 0.3 box_thresh: 0.6 max_candidates: 1000 unclip_ratio: 1.5

Metric: name: DistillationMetric base_metric_name: DetMetric main_indicator: hmean key: "Student"

Train: dataset: name: SimpleDataSet data_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det label_file_list:

/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det/train.txt ratio_list: [1.0] transforms:
DecodeImage: # load image img_mode: BGR channel_first: False
DetLabelEncode: # Class handling label
CopyPaste:
IaaAugment: augmenter_args:
- { 'type': Fliplr, 'args': { 'p': 0.5 } }
- { 'type': Affine, 'args': { 'rotate': [-10, 10] } }
- { 'type': Resize, 'args': { 'size': [0.5, 3] } }
EastRandomCropData: size: [960, 960] max_tries: 50 keep_ratio: true
MakeBorderMap: shrink_ratio: 0.4 thresh_min: 0.3 thresh_max: 0.7
MakeShrinkMap: shrink_ratio: 0.4 min_text_size: 8
NormalizeImage: scale: 1./255. mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: 'hwc'
ToCHWImage:
KeepKeys: keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list loader: shuffle: True drop_last: False
batch_size_per_card: 8

batch_size_per_card: 2 num_workers: 4

Eval: dataset: name: SimpleDataSet data_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det label_file_list:

/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det/val.txt transforms:
DecodeImage: # load image img_mode: BGR channel_first: False
DetLabelEncode: # Class handling label
DetResizeForTest:
image_shape: [736, 1280]
NormalizeImage: scale: 1./255. mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: 'hwc'
ToCHWImage:
KeepKeys: keep_keys: ['image', 'shape', 'polys', 'ignore_tags'] loader: shuffle: False drop_last: False batch_size_per_card: 1 # must be 1 num_workers: 2


- 标注文件label.txt
类似这种格式大概20张

/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det/train/zzszyfp_9.jpg [{"transcription": "连云港弘果电子有限员任公司", "points": [[138, 80], [407, 80], [407, 94], [138, 94]], "difficult": false, "key_cls": "购买方名称"}, {"transcription": "9132070605183856951", "points": [[137, 95], [407, 95], [407, 107], [137, 107]], "difficult": false, "key_cls": "购买方纳税人识别号"}, {"transcription": "江苏省连云港市海州区周培街魏书路16号0518-17417186", "points": [[138, 108], [407, 108], [407, 120], [138, 120]], "difficult": false, "key_cls": "购买方地址电话"}, {"transcription": "中国建设银行连云港市海州区支行41123350084496", "points": [[138, 124], [408, 124], [408, 134], [138, 134]], "difficult": false, "key_cls": "购买方开户行及账号"}, {"transcription": "(详见销货清单)", "points": [[54, 153], [186, 153], [186, 169], [54, 169]], "difficult": false, "key_cls": "项目名称1"}, {"transcription": "无法识别", "points": [[188, 153], [253, 153], [253, 167], [188, 167]], "difficult": false, "key_cls": "规格型号1"}, {"transcription": "无法识别", "points": [[256, 151], [288, 151], [288, 167], [256, 167]], "difficult": false, "key_cls": "单位1"}, {"transcription": "无法识别", "points": [[289, 151], [343, 151], [343, 170], [289, 170]], "difficult": false, "key_cls": "数量1"}, {"transcription": "无法识别", "points": [[344, 150], [418, 150], [418, 169], [344, 169]], "difficult": false, "key_cls": "单价1"}, {"transcription": "1441200.00", "points": [[421, 151], [506, 151], [506, 169], [421, 169]], "difficult": false, "key_cls": "金额1"}, {"transcription": "酒", "points": [[509, 152], [535, 152], [535, 171], [509, 171]], "difficult": false, "key_cls": "税率1"}, {"transcription": "187356.00", "points": [[536, 153], [638, 153], [638, 170], [536, 170]], "difficult": false, "key_cls": "税额1"}, {"transcription": "￥1441200.00", "points": [[421, 235], [505, 235], [505, 253], [421, 253]], "difficult": false, "key_cls": "合计金额"}, {"transcription": "0095E81夫", "points": [[535, 235], [637, 235], [637, 251], [535, 251]], "difficult": false, "key_cls": "合计税额"}, {"transcription": "壹佰陆洽贰万仟伍伍拾陆元整", "points": [[204, 255], [435, 255], [435, 271], [204, 271]], "difficult": false, "key_cls": "价税合计大写"}, {"transcription": "1628556.00", "points": [[529, 254], [635, 254], [635, 273], [529, 273]], "difficult": false, "key_cls": "价税合计小写"}, {"transcription": "谷满堂 ", "points": [[354, 328], [416, 328], [416, 343], [354, 343]], "difficult": false, "key_cls": "开票人"}, {"transcription": "2022年04月11日", "points": [[550, 56], [641, 56], [641, 74], [550, 74]], "difficult": false, "key_cls": "开票日期"}, {"transcription": "39882604", "points": [[491, 12], [592, 12], [592, 43], [491, 43]], "difficult": false, "key_cls": "发票号"}, {"transcription": "3207181140", "points": [[111, 19], [215, 19], [215, 39], [111, 39]], "difficult": false, "key_cls": "other"}]



- 是用paddleocrLable 标注的。如下图

![image](https://user-images.githubusercontent.com/32863094/218388700-798c425b-258e-4afc-bd21-4d58505ee2ac.png)

rexzhengzhihong commented 1 year ago

[Uploading train.txt…]()

LDOUBLEV commented 1 year ago

类似这种格式大概20张

数据太少了，你用的模型又很大ResNet50；

你的标注也有点问题，没有文字的部分不用标注，检测框只标记文字区域就可以了

如果你的场景是固定的电子发票，建议先找出你需要识别的单元格，再用识别模型去识别单元格里的内容就可以了

rexzhengzhihong commented 1 year ago

好的。谢谢。 1.ResNet50预训练模型太大。哪一个模型比较适合？比如MobileNetV3_large_x0_5_pretrained？ 2.标注的问题我改一下。 3.是固定场景的。但是图片截图不一样，单元格位置有点区别。怎么找到需要识别的单元格？？

rexzhengzhihong commented 1 year ago

@LDOUBLEV

LDOUBLEV commented 1 year ago

1.ResNet50预训练模型太大。哪一个模型比较适合？比如MobileNetV3_large_x0_5_pretrained？

配置文件中把模型结构也换成MobilenetV3的

3.是固定场景的。但是图片截图不一样，单元格位置有点区别。怎么找到需要识别的单元格？？

可以在网上搜一下表格线检测，Opencv就能实现；然后定位到要识别文字的单元格

rexzhengzhihong commented 1 year ago

1.ResNet50预训练模型太大。哪一个模型比较适合？比如MobileNetV3_large_x0_5_pretrained？

配置文件中把模型结构也换成MobilenetV3的

3.是固定场景的。但是图片截图不一样，单元格位置有点区别。怎么找到需要识别的单元格？？

可以在网上搜一下表格线检测，Opencv就能实现；然后定位到要识别文字的单元格有的空。有时候有值，有时候没有值（如下入）。该空需要标注吗？我重新标注后hmean增加了。但是预测的结果有的空还是不出来

LDOUBLEV commented 1 year ago

没有内容就不用标注，文字检测只检测有文字的区域

rexzhengzhihong commented 1 year ago

换了模型。准确率还是没啥变化

rexzhengzhihong commented 1 year ago

重新标注后，。还是用原来的esNet50预训练模型。评估的时候准确率提高了。但是预测的时候。结果还是差距比较大

rexzhengzhihong commented 1 year ago

我是 1、采用DML蒸馏方法训练检测教师模型。 2、基于DML蒸馏方法的finetune训练 3、将训练后的模型转成finetune

LDOUBLEV commented 1 year ago

现在主要问题是数据太少了，想办法扩充数据吧

rexzhengzhihong commented 1 year ago

现在主要问题是数据太少了，想办法扩充数据吧

好的。数据量我问题我再多设置一些。还有个问题。我做了“基于DML蒸馏方法的finetune训练”后。得到训练模型。通过训练模型去预测，效果还是可以的。但是只要我将模型导出成inference模型。通过inference去预测。效果就会差很多，我导出的命令是

python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o \
Global.pretrained_model="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" \
Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/"

用普通模型预测的命令是：

python3 tools/infer_det.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o \
Global.infer_img="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/test/det_imgs/" \
Global.pretrained_model="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" \
Global.save_res_path="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/test/det_imgsresult/predicts_db.txt"

用inference模型预测的命令是：

python3 tools/infer/predict_det.py --det_algorithm="DB" \
--det_model_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/Student" \
--image_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/test/det_imgs/" \
--draw_img_save_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/test/det_imgsresult/" \
--use_gpu=True

这可能是什么原因呢？？？

LDOUBLEV commented 1 year ago

python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o \ Global.pretrained_model="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" \ Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/"

这个命令换成 python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o \ Global.checkpoints="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" \ Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/"

这样试试

可能是pretrained model没有加载上

rexzhengzhihong commented 1 year ago

python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o Global.pretrained_model="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/"

这个命令换成 python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o Global.checkpoints="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/"

这样试试

可能是pretrained model没有加载上

改了一样，没效果

rexzhengzhihong commented 1 year ago

文件ch_PP-OCRv3_det_dml_zzszyfp.yml

Global:
  use_gpu: true
  #epoch_num: 1200
  epoch_num: 100
  log_smooth_window: 20
  print_batch_step: 2
  save_model_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/output/ch_db_mv3/
  #save_epoch_step: 1200
  save_epoch_step: 1200
  # evaluation is run every 5000 iterations after the 4000th iteration
  # eval_batch_step: [3000, 2000]
  eval_batch_step: [0, 20]
  cal_metric_during_train: False
  pretrained_model: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
  checkpoints:
  save_inference_dir:
  use_visualdl: False
  infer_img: doc/imgs_en/img_10.jpg
  save_res_path: ./output/det_db/predicts_db.txt

Architecture:
  name: DistillationModel
  algorithm: Distillation
  model_type: det
  Models:
    Student:
      return_all_feats: false
      model_type: det
      algorithm: DB
      Backbone:
        name: ResNet_vd
        in_channels: 3
        layers: 50
      Neck:
        name: LKPAN
        out_channels: 256
      Head:
        name: DBHead
        kernel_list: [7,2,2]
        k: 50
    Student2:
      return_all_feats: false
      model_type: det
      algorithm: DB
      Backbone:
        name: ResNet_vd
        in_channels: 3
        layers: 50
      Neck:
        name: LKPAN
        out_channels: 256
      Head:
        name: DBHead
        kernel_list: [7,2,2]
        k: 50

Loss:
  name: CombinedLoss
  loss_config_list:
  - DistillationDMLLoss:
      model_name_pairs:
      - ["Student", "Student2"]
      maps_name: "thrink_maps"
      weight: 1.0
      # act: None
      model_name_pairs: ["Student", "Student2"]
      key: maps
  - DistillationDBLoss:
      weight: 1.0
      model_name_list: ["Student", "Student2"]
      # key: maps
      name: DBLoss
      balance_loss: true
      main_loss_type: DiceLoss
      alpha: 5
      beta: 10
      ohem_ratio: 3

Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.001
    warmup_epoch: 2
  regularizer:
    name: 'L2'
    factor: 0

PostProcess:
  name: DistillationDBPostProcess
  model_name: ["Student", "Student2"]
  key: head_out
  thresh: 0.3
  box_thresh: 0.6
  max_candidates: 1000
  unclip_ratio: 1.5

Metric:
  name: DistillationMetric
  base_metric_name: DetMetric
  main_indicator: hmean
  key: "Student"

Train:
  dataset:
    name: SimpleDataSet
    data_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det
    label_file_list:
      - /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det/train.txt
    ratio_list: [1.0]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - DetLabelEncode: # Class handling label
      - CopyPaste:
      - IaaAugment:
          augmenter_args:
            - { 'type': Fliplr, 'args': { 'p': 0.5 } }
            - { 'type': Affine, 'args': { 'rotate': [-10, 10] } }
            - { 'type': Resize, 'args': { 'size': [0.5, 3] } }
      - EastRandomCropData:
          size: [960, 960]
          max_tries: 50
          keep_ratio: true
      - MakeBorderMap:
          shrink_ratio: 0.4
          thresh_min: 0.3
          thresh_max: 0.7
      - MakeShrinkMap:
          shrink_ratio: 0.4
          min_text_size: 8
      - NormalizeImage:
          scale: 1./255.
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: 'hwc'
      - ToCHWImage:
      - KeepKeys:
          keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list
  loader:
    shuffle: True
    drop_last: False
    #batch_size_per_card: 8
    batch_size_per_card: 2
    num_workers: 4

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det
    label_file_list:
      - /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det/val.txt
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - DetLabelEncode: # Class handling label
      - DetResizeForTest:
#           image_shape: [736, 1280]
      - NormalizeImage:
          scale: 1./255.
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: 'hwc'
      - ToCHWImage:
      - KeepKeys:
          keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 1 # must be 1
    num_workers: 2

rexzhengzhihong commented 1 year ago

现在主要问题是数据太少了，想办法扩充数据吧

确实。数据加到50张。准确率高了不少。来到0.8+了

rexzhengzhihong commented 1 year ago

python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o Global.pretrained_model="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/" 这个命令换成 python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o Global.checkpoints="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/" 这样试试可能是pretrained model没有加载上

改了一样，没效果。还会是什么原因，能帮忙给看看吗

@LDOUBLEV

rexzhengzhihong commented 1 year ago

可以了

bianliuyang commented 1 year ago

您好，您再使用dml训练的时候，会有这种提示吗：paddle WARNING：ppocr warning: the pretrained params backbone.* not in model ,我跟您一样的config配置文件，一样的预训练模型，不知道为啥我这边会出现这种情况。。

papersuper commented 1 year ago

python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o Global.pretrained_model="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/" 这个命令换成 python3 tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml_zzszyfp.yml -o Global.checkpoints="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml/best_accuracy" Global.save_inference_dir="/home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/inference/det_db_inference_dml_dml/" 这样试试可能是pretrained model没有加载上

改了一样，没效果

请问如何解决的呢

keyfall commented 10 months ago

哥们，数据集还有么，我也有这种项目，能发我一份么，谢谢可以的话留个言，邮箱是2536726426@qq.com

PaddlePaddle / PaddleOCR

PP-OCRv3 det文字检测教师模型dml训练。准确率一直在0.6左右。如何提高？帮忙看看 #9053

act: None

key: maps

batch_size_per_card: 8

image_shape: [736, 1280]

PaddlePaddle / PaddleOCR

PP-OCRv3 det文字检测 教师模型dml训练。准确率一直在0.6左右。如何提高？帮忙看看 #9053

act: None

key: maps

batch_size_per_card: 8

image_shape: [736, 1280]

PP-OCRv3 det文字检测教师模型dml训练。准确率一直在0.6左右。如何提高？帮忙看看 #9053