PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
41.14k stars 7.54k forks source link

如果识别的图片 大小差的有点多。小的 693*365 中等的1565 *951 大的4028*3120 这样,在预测的时候det_limit_side_len和det_limit_type 怎么设置呢 #11760

Closed rexzhengzhihong closed 4 months ago

rexzhengzhihong commented 4 months ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

如果识别的图片 大小差的有点多。小的 693365 中等的1565 951 大的4028*3120 这样,在预测的时候det_limit_side_len和det_limit_type 怎么设置呢

det_limit_side_len=736.0,det_limit_type='min' 设置这样小的图片预测没问题。但是大的图片准确率比较低。改成det_limit_side_len=960.0, det_limit_type='max',反过来小的图片准确率有点低

请尽量不要包含图片在问题中/Please try to not include the image in the issue.

GreatV commented 4 months ago

一般 960 max 就可以了

rexzhengzhihong commented 4 months ago

960 max 的话我大部分的图片都是五六百的,识别率有点低

GreatV commented 4 months ago

https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/inference_args.md

GreatV commented 4 months ago

960 max 是超过960才会缩放的,低于960应该没问题的。

rexzhengzhihong commented 4 months ago

那可能我训练的参数有问题?

Global:
  use_gpu: true
  epoch_num: 500
  log_smooth_window: 20
  print_batch_step: 2
  save_model_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/model/det/output/ch_db_mv3/
  save_epoch_step: 1200
  eval_batch_step:
  - 0
  - 20
  cal_metric_during_train: false
  pretrained_model: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
  checkpoints: null
  save_inference_dir: null
  use_visualdl: false
  infer_img: doc/imgs_en/img_10.jpg
  save_res_path: ./output/det_db/predicts_db.txt
Architecture:
  name: DistillationModel
  algorithm: Distillation
  model_type: det
  Models:
    Student:
      return_all_feats: false
      model_type: det
      algorithm: DB
      Backbone:
        name: ResNet_vd
        in_channels: 3
        layers: 50
      Neck:
        name: LKPAN
        out_channels: 256
      Head:
        name: DBHead
        kernel_list:
        - 7
        - 2
        - 2
        k: 50
      pretrained: ./pretrain_models/ResNet50_vd_ssld_pretrained
    Student2:
      return_all_feats: false
      model_type: det
      algorithm: DB
      Backbone:
        name: ResNet_vd
        in_channels: 3
        layers: 50
      Neck:
        name: LKPAN
        out_channels: 256
      Head:
        name: DBHead
        kernel_list:
        - 7
        - 2
        - 2
        k: 50
      pretrained: ./pretrain_models/ResNet50_vd_ssld_pretrained
Loss:
  name: CombinedLoss
  loss_config_list:
  - DistillationDMLLoss:
      model_name_pairs:
      - Student
      - Student2
      maps_name: thrink_maps
      weight: 1.0
      key: maps
  - DistillationDBLoss:
      weight: 1.0
      model_name_list:
      - Student
      - Student2
      name: DBLoss
      balance_loss: true
      main_loss_type: DiceLoss
      alpha: 5
      beta: 10
      ohem_ratio: 3
Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.001
    warmup_epoch: 2
  regularizer:
    name: L2
    factor: 0
PostProcess:
  name: DistillationDBPostProcess
  model_name:
  - Student
  - Student2
  key: head_out
  thresh: 0.3
  box_thresh: 0.7
  max_candidates: 1000
  unclip_ratio: 1.5
Metric:
  name: DistillationMetric
  base_metric_name: DetMetric
  main_indicator: hmean
  key: Student
Train:
  dataset:
    name: SimpleDataSet
    data_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det
    label_file_list:
    - /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det/train.txt
    ratio_list:
    - 1.0
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - DetLabelEncode: null
    - CopyPaste: null
    - IaaAugment:
        augmenter_args:
        - type: Fliplr
          args:
            p: 0.5
        - type: Affine
          args:
            rotate:
            - -10
            - 10
        - type: Resize
          args:
            size:
            - 0.5
            - 3
    - EastRandomCropData:
        size:
        - 960
        - 960
        max_tries: 50
        keep_ratio: true
    - MakeBorderMap:
        shrink_ratio: 0.4
        thresh_min: 0.3
        thresh_max: 0.7
    - MakeShrinkMap:
        shrink_ratio: 0.4
        min_text_size: 8
    - NormalizeImage:
        scale: 1./255.
        mean:
        - 0.485
        - 0.456
        - 0.406
        std:
        - 0.229
        - 0.224
        - 0.225
        order: hwc
    - ToCHWImage: null
    - KeepKeys:
        keep_keys:
        - image
        - threshold_map
        - threshold_mask
        - shrink_map
        - shrink_mask
  loader:
    shuffle: true
    drop_last: false
    batch_size_per_card: 2
    num_workers: 4
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det
    label_file_list:
    - /home/DiskA/zncsPython/picture_ocr/zzszyfp_v1/split_data/det/val.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - DetLabelEncode: null
    - DetResizeForTest: null
    - NormalizeImage:
        scale: 1./255.
        mean:
        - 0.485
        - 0.456
        - 0.406
        std:
        - 0.229
        - 0.224
        - 0.225
        order: hwc
    - ToCHWImage: null
    - KeepKeys:
        keep_keys:
        - image
        - shape
        - polys
        - ignore_tags
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 1
    num_workers: 2
rexzhengzhihong commented 4 months ago

训练的时候是哪个参数在设置缩放的?

GreatV commented 4 months ago

你的数据量有多大

GreatV commented 4 months ago
    - EastRandomCropData:
        size:
        - 960
        - 960
        max_tries: 50
rexzhengzhihong commented 4 months ago

几百张 会计的单据

GreatV commented 4 months ago

多加点数据可能会有帮助

rexzhengzhihong commented 4 months ago

设置 960 max 。det检测的效果就比较差。识别的图片大小是700*400左右的。这个有可能是哪里出问题呢?

GreatV commented 4 months ago

这个一般不用改

GreatV commented 4 months ago

也可以试试换个检测器

rexzhengzhihong commented 4 months ago

感觉不同的图片大小就应该用不同的参数吧?不然都得缩放成固定大小的了

tink2123 commented 4 months ago

如果推理图片存在很多高分辨率的样本,可以把长边限制增大。 例如:det_limit_side_len=2000,det_limit_type='max'

rexzhengzhihong commented 4 months ago

如果推理图片存在很多高分辨率的样本,可以把长边限制增大。 例如:det_limit_side_len=2000,det_limit_type='max'

好的,我试试

tink2123 commented 4 months ago

暂时将此问题关闭~ 如有需要可以再次打开。

rexzhengzhihong commented 3 months ago

如果推理图片存在很多高分辨率的样本,可以把长边限制增大。 例如:det_limit_side_len=2000,det_limit_type='max'

这样小分辨率的图片会有问题。小的图片的分辨率是693365。但是我训练的时候EastRandomCropData:size: - 960 - 960。这样好像是放大之后去训练的。当我设置det_limit_type=“max”的时候。意思是图片大于det_limit_side_len才去缩放。这张693365小于det_limit_side_len。在推理的时候,文本检测的准确率就比较低了。 应该要吧这个参数根据实际推理的图片大小去变化,还是说是在推理之前。去判断图片大小,小的图片放大之后再去推理

rexzhengzhihong commented 3 months ago

@tink2123

Sencc commented 2 months ago

@rexzhengzhihong 关于不同分辨率的图像训练,请问你解决了吗?效果有没有改善