PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.1k stars 7.81k forks source link

超轻量识别模型改变图片size后eval的acc=0 #2453

Closed zhaoyantao-murray closed 3 years ago

zhaoyantao-murray commented 3 years ago

想用低分辨率的图片训练rec模型,使用超轻量识别模型eval时,当使用图片size=[3,32,320]时一切正常,当改变size=[3,16,160]时,eval输出acc=0,如图: image image (paddleocr) C:\Users\www19\PycharmProjects\PaddleOCR>python tools/train.py -c

当只把train部分的图片size修改为[3,16,160],eval的acc=0:

configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml [2021/04/12 11:35:01] root INFO: Architecture : [2021/04/12 11:35:01] root INFO: Backbone : [2021/04/12 11:35:01] root INFO: model_name : small [2021/04/12 11:35:01] root INFO: name : MobileNetV3 [2021/04/12 11:35:01] root INFO: scale : 0.5 [2021/04/12 11:35:01] root INFO: small_stride : [1, 2, 2, 2] [2021/04/12 11:35:01] root INFO: Head : [2021/04/12 11:35:01] root INFO: fc_decay : 1e-05 [2021/04/12 11:35:01] root INFO: name : CTCHead [2021/04/12 11:35:01] root INFO: Neck : [2021/04/12 11:35:01] root INFO: encoder_type : rnn [2021/04/12 11:35:01] root INFO: hidden_size : 48 [2021/04/12 11:35:01] root INFO: name : SequenceEncoder [2021/04/12 11:35:01] root INFO: Transform : None [2021/04/12 11:35:01] root INFO: algorithm : CRNN [2021/04/12 11:35:01] root INFO: model_type : rec [2021/04/12 11:35:01] root INFO: Eval : [2021/04/12 11:35:01] root INFO: dataset : [2021/04/12 11:35:01] root INFO: data_dir : D:/Dataset/LiquorBrand/rec/ [2021/04/12 11:35:01] root INFO: label_file_list : ['D:/Dataset/LiquorBrand/rec/rec_gt.txt'] [2021/04/12 11:35:01] root INFO: name : SimpleDataSet [2021/04/12 11:35:01] root INFO: transforms : [2021/04/12 11:35:01] root INFO: DecodeImage : [2021/04/12 11:35:01] root INFO: channel_first : False [2021/04/12 11:35:01] root INFO: img_mode : BGR [2021/04/12 11:35:01] root INFO: CTCLabelEncode : None [2021/04/12 11:35:01] root INFO: RecResizeImg : [2021/04/12 11:35:01] root INFO: image_shape : [3, 32, 320] [2021/04/12 11:35:01] root INFO: KeepKeys : [2021/04/12 11:35:01] root INFO: keep_keys : ['image', 'label', 'length'] [2021/04/12 11:35:01] root INFO: loader : [2021/04/12 11:35:01] root INFO: batch_size_per_card : 32 [2021/04/12 11:35:01] root INFO: drop_last : False [2021/04/12 11:35:01] root INFO: num_workers : 8 [2021/04/12 11:35:01] root INFO: shuffle : False [2021/04/12 11:35:01] root INFO: Global : [2021/04/12 11:35:01] root INFO: cal_metric_during_train : True [2021/04/12 11:35:01] root INFO: character_dict_path : ppocr/utils/ppocr_keys_v1.txt [2021/04/12 11:35:01] root INFO: character_type : ch [2021/04/12 11:35:01] root INFO: checkpoints : None [2021/04/12 11:35:01] root INFO: debug : False [2021/04/12 11:35:01] root INFO: distributed : False [2021/04/12 11:35:01] root INFO: epoch_num : 100 [2021/04/12 11:35:01] root INFO: eval_batch_step : 100 [2021/04/12 11:35:01] root INFO: infer_img : doc/imgs_words/ch/word_1.jpg [2021/04/12 11:35:01] root INFO: infer_mode : False [2021/04/12 11:35:01] root INFO: log_smooth_window : 100 [2021/04/12 11:35:01] root INFO: max_text_length : 50 [2021/04/12 11:35:01] root INFO: pretrained_model : ./pretrained_models/ch_ppocr_mobile_v2.0_rec_train/best_accuracy [2021/04/12 11:35:01] root INFO: print_batch_step : 100 [2021/04/12 11:35:01] root INFO: save_epoch_step : 100 [2021/04/12 11:35:01] root INFO: save_inference_dir : None [2021/04/12 11:35:01] root INFO: save_model_dir : ./output/rec_chinese_lite_v2.0 [2021/04/12 11:35:01] root INFO: use_gpu : True [2021/04/12 11:35:01] root INFO: use_space_char : True [2021/04/12 11:35:01] root INFO: use_visualdl : True [2021/04/12 11:35:01] root INFO: Loss : [2021/04/12 11:35:01] root INFO: name : CTCLoss [2021/04/12 11:35:01] root INFO: Metric : [2021/04/12 11:35:01] root INFO: main_indicator : acc [2021/04/12 11:35:01] root INFO: name : RecMetric [2021/04/12 11:35:01] root INFO: Optimizer : [2021/04/12 11:35:01] root INFO: beta1 : 0.9 [2021/04/12 11:35:01] root INFO: beta2 : 0.999 [2021/04/12 11:35:01] root INFO: lr : [2021/04/12 11:35:01] root INFO: learning_rate : 0.001 [2021/04/12 11:35:01] root INFO: name : Cosine [2021/04/12 11:35:01] root INFO: name : Adam [2021/04/12 11:35:01] root INFO: regularizer : [2021/04/12 11:35:01] root INFO: factor : 1e-05 [2021/04/12 11:35:01] root INFO: name : L2 [2021/04/12 11:35:01] root INFO: PostProcess : [2021/04/12 11:35:01] root INFO: name : CTCLabelDecode [2021/04/12 11:35:01] root INFO: Train : [2021/04/12 11:35:01] root INFO: dataset : [2021/04/12 11:35:01] root INFO: data_dir : D:/Dataset/LabelImgsSet1+3/rec/ [2021/04/12 11:35:01] root INFO: label_file_list : ['D:/Dataset/LabelImgsSet1+3/rec/rec_gt.txt'] [2021/04/12 11:35:01] root INFO: name : SimpleDataSet [2021/04/12 11:35:01] root INFO: transforms : [2021/04/12 11:35:01] root INFO: DecodeImage : [2021/04/12 11:35:01] root INFO: channel_first : False [2021/04/12 11:35:01] root INFO: img_mode : BGR [2021/04/12 11:35:01] root INFO: RecAug : None [2021/04/12 11:35:01] root INFO: CTCLabelEncode : None [2021/04/12 11:35:01] root INFO: RecResizeImg : [2021/04/12 11:35:01] root INFO: image_shape : [3, 16, 160] [2021/04/12 11:35:01] root INFO: KeepKeys : [2021/04/12 11:35:01] root INFO: keep_keys : ['image', 'label', 'length'] [2021/04/12 11:35:01] root INFO: loader : [2021/04/12 11:35:01] root INFO: batch_size_per_card : 32 [2021/04/12 11:35:01] root INFO: drop_last : True [2021/04/12 11:35:01] root INFO: num_workers : 8 [2021/04/12 11:35:01] root INFO: shuffle : True [2021/04/12 11:35:01] root INFO: train with paddle 2.0.0 and device CUDAPlace(0) [2021/04/12 11:35:01] root INFO: Initialize indexs of datasets:['D:/Dataset/LabelImgsSet1+3/rec/rec_gt.txt'] [2021/04/12 11:35:01] root INFO: Initialize indexs of datasets:['D:/Dataset/LiquorBrand/rec/rec_gt.txt'] W0412 11:35:01.595120 8608 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.1, Runtime API Version: 1 0.1 W0412 11:35:01.609120 8608 device_context.cc:372] device: 0, cuDNN Version: 7.6. [2021/04/12 11:35:08] root INFO: load pretrained model from ['./pretrained_models/ch_ppocr_mobile_v2.0_rec_train/best_accuracy'] [2021/04/12 11:35:08] root INFO: train dataloader has 98 iters, valid dataloader has 34 iters [2021/04/12 11:35:08] root INFO: Initialize indexs of datasets:['D:/Dataset/LabelImgsSet1+3/rec/rec_gt.txt'] [2021/04/12 11:35:59] root INFO: save model in ./output/rec_chinese_lite_v2.0\latest [2021/04/12 11:35:59] root INFO: Initialize indexs of datasets:['D:/Dataset/LabelImgsSet1+3/rec/rec_gt.txt'] [2021/04/12 11:36:00] root INFO: epoch: [2/100], iter: 100, lr: 0.001000, loss: 39.212029, acc: 0.062500, norm_edit_dis: 0.434699, reader_cost: 0.00150 s, batch_cost: 0.01099 s, samples: 96, ips: 87.35561 eval model:: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00, 4.87it/s] [2021/04/12 11:36:07] root INFO: cur metric, acc: 0.0, norm_edit_dis: 0.002772043585472561, fps: 160.15878464027196 [2021/04/12 11:36:08] root INFO: save best model is to ./output/rec_chinese_lite_v2.0\best_accuracy [2021/04/12 11:36:08] root INFO: best metric, acc: 0.0, norm_edit_dis: 0.002772043585472561, fps: 160.15878464027196, best_epoch: 2 [2021/04/12 11:36:38] root INFO: save model in ./output/rec_chinese_lite_v2.0\latest [2021/04/12 11:36:38] root INFO: Initialize indexs of datasets:['D:/Dataset/LabelImgsSet1+3/rec/rec_gt.txt'] [2021/04/12 11:36:40] root INFO: epoch: [3/100], iter: 200, lr: 0.000999, loss: 24.938742, acc: 0.187500, norm_edit_dis: 0.631264, reader_cost: 0.00093 s, batch_cost: 0.01738 s, samples: 160, ips: 92.05993 eval model:: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:07<00:00, 4.75it/s] [2021/04/12 11:36:47] root INFO: cur metric, acc: 0.0, norm_edit_dis: 0.001614020783662351, fps: 157.42807800876517 [2021/04/12 11:36:47] root INFO: save best model is to ./output/rec_chinese_lite_v2.0\best_accuracy [2021/04/12 11:36:47] root INFO: best metric, acc: 0.0, norm_edit_dis: 0.001614020783662351, fps: 157.42807800876517, best_epoch: 3 [2021/04/12 11:37:19] root INFO: save model in ./output/rec_chinese_lite_v2.0\latest [2021/04/12 11:37:19] root INFO: Initialize indexs of datasets:['D:/Dataset/LabelImgsSet1+3/rec/rec_gt.txt'] [2021/04/12 11:37:22] root INFO: epoch: [4/100], iter: 300, lr: 0.000998, loss: 20.342499, acc: 0.281250, norm_edit_dis: 0.706760, reader_cost: 0.00256 s, batch_cost: 0.02804 s, samples: 224, ips: 79.88187 eval model:: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:06<00:00, 4.87it/s] [2021/04/12 11:37:29] root INFO: cur metric, acc: 0.0, norm_edit_dis: 0.003546549826524137, fps: 157.90355490076698 [2021/04/12 11:37:30] root INFO: save best model is to ./output/rec_chinese_lite_v2.0\best_accuracy [2021/04/12 11:37:30] root INFO: best metric, acc: 0.0, norm_edit_dis: 0.003546549826524137, fps: 157.90355490076698, best_epoch: 4

zhaoyantao-murray commented 3 years ago

问题已解决,飞桨给的预训练模型使用size-[3,32,320],而我直接用来eval图片size=[3,16,160],acc=0正常; 训练的话保持train和eval对齐多训练几个epoch就可以看到acc了,前几个epoch为0;

Hupengyu commented 3 years ago

问题已解决,飞桨给的预训练模型使用size-[3,32,320],而我直接用来eval图片size=[3,16,160],acc=0正常; 训练的话保持train和eval对齐多训练几个epoch就可以看到acc了,前几个epoch为0;

为什么啊,官网给的yml中size都是一样大小的啊,你自己之前改了吗?