PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
43.9k stars 7.8k forks source link

ocr迁移学习收敛但识别训练数据集不准 #2959

Closed chocolate-byte closed 3 years ago

chocolate-byte commented 3 years ago

如题,根据给出的rec_mv3_none_bilstm_ctc.tar预训练模型,icdar2015数据集和label,i在修改yml之后使用了预训练模型,训练300个epoch后收敛,最佳为第102epoch,但第102和300模型识别icdar训练集图像错误率约90%以上,更别提测试集了,显存不足batchsize目前改为32,由于新入门不确定参数是否有错,希望各位不吝赐教 basically I'm training with paddleocr's rec_mv3_none_bilstm_ctc.tar module and labels, the dataset is icdar2015 with 4000 training pics, after fine tune the acc was around 0.99 and loss was 0.2 which was pretty good looking, though the module still couldn't rec the pic picked from train dataset, i'm new in ocr area so hope anyone could figure out what's going on and i would be appreciate about that. ps:有人知道inference模型该怎么针对0-9数字和小数点迁移学习吗,想finetune并将轻量版的模型更换一下是否按照icdar数据集对0-9单个数字分类训练就行了。另外paddleocr里似乎没有图像分割的介绍,可以问一下具体在哪个文件吗

102,300epoch参数如下: 102: ter: 14070, lr: 0.000500, loss: 0.636910, acc: 0.843750, norm_edit_dis: 0.962413, reader_cost: 0.00000 s, batch_cost: 0.06822 s, samples: 320, ips: 469.07642 300: [2021/05/26 13:14:24] root INFO: save model in ./output/rec/ic15/latest epoch: [300/300], iter: 41360, lr: 0.000500, loss: 0.020608, acc: 1.000000, norm_edit_dis: 1.000000, reader_cost: 0.00010 s, batch_cost: 0.10216 s, samples: 320, ips: 313.23260 epoch: [300/300], iter: 41370, lr: 0.000500, loss: 0.032471, acc: 1.000000, norm_edit_dis: 1.000000, reader_cost: 0.00178 s, batch_cost: 0.09483 s, samples: 320, ips: 337.42971 epoch: [300/300], iter: 41380, lr: 0.000500, loss: 0.032471, acc: 1.000000, norm_edit_dis: 1.000000, reader_cost: 0.00010 s, batch_cost: 0.09192 s, samples: 320, ips: 348.14319 epoch: [300/300], iter: 41390, lr: 0.000500, loss: 0.023093, acc: 1.000000, norm_edit_dis: 1.000000, reader_cost: 0.00010 s, batch_cost: 0.07407 s, samples: 320, ips: 432.01792 save model in ./output/rec/ic15/latest save model in ./output/rec/ic15/iter_epoch_300 best metric, acc: 0.00732421875, norm_edit_dis: 0.12190923293828548, fps: 833.680501500976, best_epoch: 102

yml和运行代码如下 PULL 为 训练集图像word_71.png识别结果为 result: ('19', 0.92347664) 训练代码: python tools/train.py -c configs/rec/rec_icdar15_train.yml

预测代码: (paddleocr2) D:\Download\paddleocr\PaddleOCR-release-2.1>python tools/infer_rec.py -c configs/rec/rec_icdar15_train.yml -o Global.pretrained_model=D:/Download/paddleocr/PaddleOCR-release-2.1/output/rec/ic15/best_accuracy Global.load_static_weights=false Global.infer_img=D:/Download/paddleocr/PaddleOCR-release-2.1/train_data/ic15_data/train/word_71.png [2021/05/28 14:05:50] root INFO: Architecture : [2021/05/28 14:05:50] root INFO: Backbone : [2021/05/28 14:05:50] root INFO: model_name : large [2021/05/28 14:05:50] root INFO: name : MobileNetV3 [2021/05/28 14:05:50] root INFO: scale : 0.5 [2021/05/28 14:05:50] root INFO: Head : [2021/05/28 14:05:50] root INFO: fc_decay : 0 [2021/05/28 14:05:50] root INFO: name : CTCHead [2021/05/28 14:05:50] root INFO: Neck : [2021/05/28 14:05:50] root INFO: encoder_type : rnn [2021/05/28 14:05:50] root INFO: hidden_size : 96 [2021/05/28 14:05:50] root INFO: name : SequenceEncoder [2021/05/28 14:05:50] root INFO: Transform : None [2021/05/28 14:05:50] root INFO: algorithm : CRNN [2021/05/28 14:05:50] root INFO: model_type : rec [2021/05/28 14:05:50] root INFO: Eval : [2021/05/28 14:05:50] root INFO: dataset : [2021/05/28 14:05:50] root INFO: data_dir : ./train_data/ic15_data [2021/05/28 14:05:50] root INFO: label_file_list : ['./train_data/ic15_data/rec_gt_test.txt'] [2021/05/28 14:05:50] root INFO: name : SimpleDataSet [2021/05/28 14:05:50] root INFO: transforms : [2021/05/28 14:05:50] root INFO: DecodeImage : [2021/05/28 14:05:50] root INFO: channel_first : False [2021/05/28 14:05:50] root INFO: img_mode : BGR [2021/05/28 14:05:50] root INFO: CTCLabelEncode : None [2021/05/28 14:05:50] root INFO: RecResizeImg : [2021/05/28 14:05:50] root INFO: image_shape : [3, 32, 100] [2021/05/28 14:05:50] root INFO: KeepKeys : [2021/05/28 14:05:50] root INFO: keep_keys : ['image', 'label', 'length'] [2021/05/28 14:05:50] root INFO: loader : [2021/05/28 14:05:50] root INFO: batch_size_per_card : 32 [2021/05/28 14:05:50] root INFO: drop_last : False [2021/05/28 14:05:50] root INFO: num_workers : 4 [2021/05/28 14:05:50] root INFO: shuffle : False [2021/05/28 14:05:50] root INFO: use_shared_memory : False [2021/05/28 14:05:50] root INFO: Global : [2021/05/28 14:05:50] root INFO: cal_metric_during_train : True [2021/05/28 14:05:50] root INFO: character_dict_path : ppocr/utils/ic15_dict.txt [2021/05/28 14:05:50] root INFO: character_type : ch [2021/05/28 14:05:50] root INFO: checkpoints : None [2021/05/28 14:05:50] root INFO: debug : False [2021/05/28 14:05:50] root INFO: distributed : False [2021/05/28 14:05:50] root INFO: epoch_num : 300 [2021/05/28 14:05:50] root INFO: eval_batch_step : [0, 2000] [2021/05/28 14:05:50] root INFO: infer_img : D:/Download/paddleocr/PaddleOCR-release-2.1/train_data/ic15_data/train/word_71.png [2021/05/28 14:05:50] root INFO: infer_mode : False [2021/05/28 14:05:50] root INFO: load_static_weights : False [2021/05/28 14:05:50] root INFO: log_smooth_window : 20 [2021/05/28 14:05:50] root INFO: max_text_length : 25 [2021/05/28 14:05:50] root INFO: pretrained_model : D:/Download/paddleocr/PaddleOCR-release-2.1/output/rec/ic15/best_accuracy [2021/05/28 14:05:50] root INFO: print_batch_step : 10 [2021/05/28 14:05:50] root INFO: save_epoch_step : 3 [2021/05/28 14:05:50] root INFO: save_inference_dir : ./output/ [2021/05/28 14:05:50] root INFO: save_model_dir : ./output/ic15_batch64/ [2021/05/28 14:05:50] root INFO: save_res_path : ./output/rec/predicts_ic15.txt [2021/05/28 14:05:50] root INFO: use_gpu : True [2021/05/28 14:05:50] root INFO: use_space_char : False [2021/05/28 14:05:50] root INFO: use_visualdl : True [2021/05/28 14:05:50] root INFO: Loss : [2021/05/28 14:05:50] root INFO: name : CTCLoss [2021/05/28 14:05:50] root INFO: Metric : [2021/05/28 14:05:50] root INFO: main_indicator : acc [2021/05/28 14:05:50] root INFO: name : RecMetric [2021/05/28 14:05:50] root INFO: Optimizer : [2021/05/28 14:05:50] root INFO: beta1 : 0.9 [2021/05/28 14:05:50] root INFO: beta2 : 0.999 [2021/05/28 14:05:50] root INFO: lr : [2021/05/28 14:05:50] root INFO: learning_rate : 0.005 [2021/05/28 14:05:50] root INFO: name : Adam [2021/05/28 14:05:50] root INFO: regularizer : [2021/05/28 14:05:50] root INFO: factor : 0 [2021/05/28 14:05:50] root INFO: name : L2 [2021/05/28 14:05:50] root INFO: PostProcess : [2021/05/28 14:05:50] root INFO: name : CTCLabelDecode [2021/05/28 14:05:50] root INFO: Train : [2021/05/28 14:05:50] root INFO: dataset : [2021/05/28 14:05:50] root INFO: data_dir : ./train_data/ic15_data [2021/05/28 14:05:50] root INFO: label_file_list : ['./train_data/ic15_data/rec_gt_train.txt'] [2021/05/28 14:05:50] root INFO: name : SimpleDataSet [2021/05/28 14:05:50] root INFO: transforms : [2021/05/28 14:05:50] root INFO: DecodeImage : [2021/05/28 14:05:50] root INFO: channel_first : False [2021/05/28 14:05:50] root INFO: img_mode : BGR [2021/05/28 14:05:50] root INFO: CTCLabelEncode : None [2021/05/28 14:05:50] root INFO: RecResizeImg : [2021/05/28 14:05:50] root INFO: image_shape : [3, 32, 100] [2021/05/28 14:05:50] root INFO: KeepKeys : [2021/05/28 14:05:50] root INFO: keep_keys : ['image', 'label', 'length'] [2021/05/28 14:05:50] root INFO: loader : [2021/05/28 14:05:50] root INFO: batch_size_per_card : 32 [2021/05/28 14:05:50] root INFO: drop_last : True [2021/05/28 14:05:50] root INFO: num_workers : 8 [2021/05/28 14:05:50] root INFO: shuffle : True [2021/05/28 14:05:50] root INFO: use_shared_memory : False [2021/05/28 14:05:50] root INFO: train with paddle 2.1.0 and device CUDAPlace(0) W0528 14:05:50.568840 15300 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.3, Runtime API Version: 10.1 W0528 14:05:50.584471 15300 device_context.cc:422] device: 0, cuDNN Version: 7.6. [2021/05/28 14:05:52] root INFO: load pretrained model from ['D:/Download/paddleocr/PaddleOCR-release-2.1/output/rec/ic15/best_accuracy'] [2021/05/28 14:05:52] root INFO: infer_img: D:/Download/paddleocr/PaddleOCR-release-2.1/train_data/ic15_data/train/word_71.png [2021/05/28 14:05:52] root INFO: result: ('19', 0.92347664) [2021/05/28 14:05:52] root INFO: success!

icdar yml如下 Global: use_gpu: True epoch_num: 300 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/rec/ic15/ save_epoch_step: 3

evaluation is run every 2000 iterations

eval_batch_step: [0, 2000] cal_metric_during_train: True pretrained_model: ./pretrain_models/best_accuracy checkpoints: save_inference_dir: ./output/ use_visualdl: True infer_img: doc/imgs_words_en/word_10.png

for data or label process

character_dict_path: ppocr/utils/ic15_dict.txt character_type: ch max_text_length: 25 infer_mode: False use_space_char: False save_res_path: ./output/rec/predicts_ic15.txt

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: learning_rate: 0.005 regularizer: name: 'L2' factor: 0

Architecture: model_type: rec algorithm: CRNN Transform: Backbone: name: MobileNetV3 scale: 0.5 model_name: large Neck: name: SequenceEncoder encoder_type: rnn hidden_size: 96 Head: name: CTCHead fc_decay: 0

Loss: name: CTCLoss

PostProcess: name: CTCLabelDecode

Metric: name: RecMetric main_indicator: acc

Train: dataset: name: SimpleDataSet data_dir: ./train_data/ic15_data label_file_list: ["./train_data/ic15_data/rec_gt_train.txt"] transforms:

Eval: dataset: name: SimpleDataSet data_dir: ./train_data/ic15_data label_file_list: ["./train_data/ic15_data/rec_gt_test.txt"] transforms:

chocolate-byte commented 3 years ago

换了个随机多数字生成的数据集,约有10000个训练集4000测试集,标注是图片名字+\t +随机的多个数字,很奇怪的是icdar没有用到四点标注就没加,但是识别效果还是连数字位数都分不清,训练开始到第二个epoch时acc直奔0.9,loss直降0.3,evaluate如下,另外请问下训练集需要人工进行每个数字的四点标注吗 eval model:: 98%|█████████▊| 46/47 [00:06<00:00, 12.19it/s] eval model:: 98%|█████████▊| 46/47 [00:06<00:00, 7.36it/s] [2021/06/02 16:00:15] root INFO: metric eval *** [2021/06/02 16:00:15] root INFO: acc:0.9976222826086957 [2021/06/02 16:00:15] root INFO: norm_edit_dis:0.9996188103864734 [2021/06/02 16:00:15] root INFO: fps:508.3436185107257 INFO 2021-06-02 16:00:18,812 launch.py:266] Local processes completed

chocolate-byte commented 3 years ago

原因找到了,icdar数据集太模糊,而且5000张不够看,合成数据增大以后就好了=_@