PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.27k stars 7.82k forks source link

DB++转为inference模型后效果很差 #8991

Closed LUXUS1 closed 1 year ago

LUXUS1 commented 1 year ago

DB++在使用infer_det.py进行推理时效果较好,但转为inference模型后效果很差。也尝试进行了参数对齐,但效果并没有改变。 检测结果: image image 运行指令: python3 tools/infer_det.py -c configs/det/det_r50_db++_td_tr.yml -o Global.infer_img="./images/3.jpg" Global.pretrained_model="./seal_models/DB++/best_accuracy" python3 tools/export_model.py -c configs/det/det_r50_db++_td_tr.yml -o Global.pretrained_model=./seal_models/DB++/best_accuracy Global.save_inference_dir=./inference/det_db++/ python3 tools/infer/predict_det.py --image_dir="./images/3.jpg" --det_model_dir="./inference/det_db++/" --det_algorithm="DB++" --use_gpu=false 参数配置: det_r50_db++_td_tr.yml Global: debug: false use_gpu: False epoch_num: 1000 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/det_r50_td_tr/ save_epoch_step: 200 eval_batch_step:

--perdict.py image image

andyjiang1116 commented 1 year ago

训练和预测使用的shape要保持一致

LDOUBLEV commented 1 year ago

预测时在这里加个参数 "image_shape": [736, 736] 和训练配置里的shape保持一致 https://github.com/PaddlePaddle/PaddleOCR/blob/0ed9d8889f7ba90949e64fa6e3240df282983a77/tools/infer/predict_det.py#L44

代码里默认都是动态shape预测,所以没有设置固定shape预测的参数,DetResizeForTest这个函数支持传入自定义shape

LUXUS1 commented 1 year ago

感谢,不过又遇到了新的问题,

  1. 预测同一张图像"--det_box_type", type=str, default='poly'会显示如下错误:

image 改为"--det_box_type", type=str, default='quad'错误消失但只能识别四边形文本,如下所示: image

  1. "--det_box_type", type=str, default='poly'时个别图像可以正常检测出来,如下: image 但大多数图像无法检测
LDOUBLEV commented 1 year ago

训练模型预测会报错吗?

在报错前打印dt_boxes_new的shape看下,是不是第一个维度为0

LUXUS1 commented 1 year ago

预训练模型和中文的这个都会报错

  1. 预训练模型 image

  2. 中文模型 image

  3. 中文个别可以检测的图像输出如下: image

LDOUBLEV commented 1 year ago

dt_boxes = np.array(dt_boxes_new) 这一行删掉,直接return dt_boxes_new

原因是预测的box的点数不一致,不能直接组成一个数组;

LUXUS1 commented 1 year ago

感谢!

LUXUS1 commented 1 year ago

在上个问题中,删掉dt_boxes = np.array(dt_boxes_new),直接返回dt_boxes_new确实可以解决问题。但使用predict_system.py将DB++与SVTR进行串联使用时,改动的位置便会报错,请问如何解决?感谢 image

LUXUS1 commented 1 year ago

在上个问题中,删掉dt_boxes = np.array(dt_boxes_new),直接返回dt_boxes_new确实可以解决问题。但使用predict_system.py将DB++与SVTR进行串联使用时,改动的位置便会报错,请问如何解决?感谢 image

num_boxes = dt_boxes.shape[0]修改为num_boxes = len(dt_boxes)

papersuper commented 1 year ago

预测时在这里加个参数 "image_shape": [736, 736] 和训练配置里的shape保持一致

https://github.com/PaddlePaddle/PaddleOCR/blob/0ed9d8889f7ba90949e64fa6e3240df282983a77/tools/infer/predict_det.py#L44

代码里默认都是动态shape预测,所以没有设置固定shape预测的参数,DetResizeForTest这个函数支持传入自定义shape 你好,请问一下,是在PaddleOCR/tools/infer/predict_det.py的DetResizeForTest里面加入“image_shape”:[512,512]是吗?我的会报错呢

Homura852 commented 7 months ago

@LUXUS1 兄弟您好,请问可以添加个联系方式交流一下DBnet++的训练推理吗

kkiskkk commented 2 months ago

DB++在使用infer_det.py进行推理时效果较好,但转为inference模型后效果很差。也尝试进行了参数对齐,但效果并没有改变。 检测结果: image image 运行指令: python3 tools/infer_det.py -c configs/det/det_r50_db++_td_tr.yml -o Global.infer_img="./images/3.jpg" Global.pretrained_model="./seal_models/DB++/best_accuracy" python3 tools/export_model.py -c configs/det/det_r50_db++_td_tr.yml -o Global.pretrained_model=./seal_models/DB++/best_accuracy Global.save_inference_dir=./inference/det_db++/ python3 tools/infer/predict_det.py --image_dir="./images/3.jpg" --det_model_dir="./inference/det_db++/" --det_algorithm="DB++" --use_gpu=false 参数配置: det_r50_db++_td_tr.yml Global: debug: false use_gpu: False epoch_num: 1000 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/det_r50_td_tr/ save_epoch_step: 200 eval_batch_step:

  • 0
  • 2000 cal_metric_during_train: false pretrained_model: ./pretrain_models/ResNet50_dcn_asf_synthtext_pretrained checkpoints: null save_inference_dir: null use_visualdl: false infer_img: doc/imgs_en/img_10.jpg save_res_path: ./checkpoints/det_db/predicts_db.txt Architecture: model_type: det algorithm: DB++ Transform: null Backbone: name: ResNet layers: 50 dcn_stage: [False, True, True, True] Neck: name: DBFPN out_channels: 256 use_asf: True Head: name: DBHead k: 50 Loss: name: DBLoss balance_loss: true main_loss_type: BCELoss alpha: 5 beta: 10 ohem_ratio: 3 Optimizer: name: Momentum momentum: 0.9 lr: name: DecayLearningRate learning_rate: 0.007 epochs: 1000 factor: 0.9 end_lr: 0 weight_decay: 0.0001 PostProcess: name: DBPostProcess # 后处理类名 thresh: 0.3 # DBPostProcess中分割图进行二值化的阈值 box_thresh: 0.2 # DBPostProcess中对输出框进行过滤的阈值,低于此阈值的框不会输出 max_candidates: 1000 unclip_ratio: 2.0 det_box_type: 'poly' # 'quad' or 'poly' Metric: name: DetMetric main_indicator: hmean Train: dataset: name: SimpleDataSet data_dir: ./train_data/ label_file_list:

    • ./train_data/TD_TR/TD500/train_gt_labels.txt

    • ./train_data/TD_TR/TR400/gt_labels.txt ratio_list:

    • 1.0

    • 1.0 transforms:

    • DecodeImage: img_mode: BGR channel_first: false

    • DetLabelEncode: null

    • IaaAugment: augmenter_args:

    • type: Fliplr args: p: 0.5

    • type: Affine args: rotate:

      • -10
      • 10
    • type: Resize args: size:

      • 0.5
      • 3
    • EastRandomCropData: size:

    • 640

    • 640 max_tries: 10 keep_ratio: true

    • MakeShrinkMap: shrink_ratio: 0.4 min_text_size: 8

    • MakeBorderMap: shrink_ratio: 0.4 thresh_min: 0.3 thresh_max: 0.7

    • NormalizeImage: scale: 1./255. mean:

    • 0.48109378172549

    • 0.45752457890196

    • 0.40787054090196 std:

    • 1.0

    • 1.0

    • 1.0 order: hwc

    • ToCHWImage: null

    • KeepKeys: keep_keys:

    • image

    • threshold_map

    • threshold_mask

    • shrink_map

    • shrink_mask loader: shuffle: true drop_last: false batch_size_per_card: 4 num_workers: 8 Eval: dataset: name: SimpleDataSet data_dir: ./train_data/ label_file_list:

    • ./train_data/TD_TR/TD500/test_gt_labels.txt transforms:

    • DecodeImage: img_mode: BGR channel_first: false

    • DetLabelEncode: null

    • DetResizeForTest: image_shape:

    • 736

    • 736 keep_ratio: True

    • NormalizeImage: scale: 1./255. mean:

    • 0.48109378172549

    • 0.45752457890196

    • 0.40787054090196 std:

    • 1.0

    • 1.0

    • 1.0 order: hwc

    • ToCHWImage: null

    • KeepKeys: keep_keys:

    • image

    • shape

    • polys

    • ignore_tags loader: shuffle: false drop_last: false batch_size_per_card: 1 num_workers: 2 profiler_options: null

--perdict.py image image

可以分享下这个权重吗? 我在官方下载的就算把det_box_type: 'poly' 设置为poly 检测结果还是只有矩形框。