PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.26k stars 7.82k forks source link

文字识别任务fineturn,准确率为0 #8247

Closed jackliu1111 closed 10 months ago

jackliu1111 commented 2 years ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

选了40w张垂类真实图片,进行文字识别任务微调 在选择参数微调时,按https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/finetune.md 官方文档选择了预训练模型和配置文件如图:

image

出现的问题: 1.预训练过程中,acc始终为0. 2.警告:模型参数与网络参数不一致。如图: image 不知道这两个现象是否正常。

网络参数如下: [2022/11/09 15:51:53] ppocr INFO: encoder_type : rnn [2022/11/09 15:51:53] ppocr INFO: hidden_size : 64 [2022/11/09 15:51:53] ppocr INFO: name : SequenceEncoder [2022/11/09 15:51:53] ppocr INFO: Transform : None [2022/11/09 15:51:53] ppocr INFO: algorithm : CRNN [2022/11/09 15:51:53] ppocr INFO: freeze_params : False [2022/11/09 15:51:53] ppocr INFO: model_type : rec [2022/11/09 15:51:53] ppocr INFO: pretrained : None [2022/11/09 15:51:53] ppocr INFO: return_all_feats : True [2022/11/09 15:51:53] ppocr INFO: Teacher : [2022/11/09 15:51:53] ppocr INFO: Backbone : [2022/11/09 15:51:53] ppocr INFO: name : MobileNetV1Enhance [2022/11/09 15:51:53] ppocr INFO: scale : 0.5 [2022/11/09 15:51:53] ppocr INFO: Head : [2022/11/09 15:51:53] ppocr INFO: fc_decay : 2e-05 [2022/11/09 15:51:53] ppocr INFO: mid_channels : 96 [2022/11/09 15:51:53] ppocr INFO: name : CTCHead [2022/11/09 15:51:53] ppocr INFO: Neck : [2022/11/09 15:51:53] ppocr INFO: encoder_type : rnn [2022/11/09 15:51:53] ppocr INFO: hidden_size : 64 [2022/11/09 15:51:53] ppocr INFO: name : SequenceEncoder [2022/11/09 15:51:53] ppocr INFO: Transform : None [2022/11/09 15:51:53] ppocr INFO: algorithm : CRNN [2022/11/09 15:51:53] ppocr INFO: freeze_params : False [2022/11/09 15:51:53] ppocr INFO: model_type : rec [2022/11/09 15:51:53] ppocr INFO: pretrained : None [2022/11/09 15:51:53] ppocr INFO: return_all_feats : True [2022/11/09 15:51:53] ppocr INFO: algorithm : Distillation [2022/11/09 15:51:53] ppocr INFO: model_type : rec [2022/11/09 15:51:53] ppocr INFO: name : DistillationModel [2022/11/09 15:51:53] ppocr INFO: Eval : [2022/11/09 15:51:53] ppocr INFO: dataset : [2022/11/09 15:51:53] ppocr INFO: data_dir : E:/license/train_data/rec [2022/11/09 15:51:53] ppocr INFO: label_file_list : ['E:/license/train_data/rec_gt_test.txt'] [2022/11/09 15:51:53] ppocr INFO: name : SimpleDataSet [2022/11/09 15:51:53] ppocr INFO: transforms : [2022/11/09 15:51:53] ppocr INFO: DecodeImage : [2022/11/09 15:51:53] ppocr INFO: channel_first : False [2022/11/09 15:51:53] ppocr INFO: img_mode : BGR [2022/11/09 15:51:53] ppocr INFO: CTCLabelEncode : None [2022/11/09 15:51:53] ppocr INFO: RecResizeImg : [2022/11/09 15:51:53] ppocr INFO: image_shape : [3, 32, 320] [2022/11/09 15:51:53] ppocr INFO: KeepKeys : [2022/11/09 15:51:53] ppocr INFO: keep_keys : ['image', 'label', 'length'] [2022/11/09 15:51:53] ppocr INFO: loader : [2022/11/09 15:51:53] ppocr INFO: batch_size_per_card : 64 [2022/11/09 15:51:53] ppocr INFO: drop_last : False [2022/11/09 15:51:53] ppocr INFO: num_workers : 1 [2022/11/09 15:51:53] ppocr INFO: shuffle : False [2022/11/09 15:51:53] ppocr INFO: Global : [2022/11/09 15:51:53] ppocr INFO: cal_metric_during_train : True [2022/11/09 15:51:53] ppocr INFO: character_dict_path : E:/PaddleOCR-release-2.6/ppocr/utils/ppocr_keys_v1.txt [2022/11/09 15:51:53] ppocr INFO: checkpoints : None [2022/11/09 15:51:53] ppocr INFO: debug : False [2022/11/09 15:51:53] ppocr INFO: distributed : False [2022/11/09 15:51:53] ppocr INFO: epoch_num : 800 [2022/11/09 15:51:53] ppocr INFO: eval_batch_step : [0, 2000] [2022/11/09 15:51:53] ppocr INFO: infer_img : doc/imgs_words/ch/word_1.jpg [2022/11/09 15:51:53] ppocr INFO: infer_mode : False [2022/11/09 15:51:53] ppocr INFO: log_smooth_window : 20 [2022/11/09 15:51:53] ppocr INFO: max_text_length : 25 [2022/11/09 15:51:53] ppocr INFO: pretrained_model : E:/PaddleOCR-release-2.6/pretrain_model/ch_PP-OCRv2_rec_train/best_accuracy.pdparams [2022/11/09 15:51:53] ppocr INFO: print_batch_step : 10 [2022/11/09 15:51:53] ppocr INFO: save_epoch_step : 3 [2022/11/09 15:51:53] ppocr INFO: save_inference_dir : None [2022/11/09 15:51:53] ppocr INFO: save_model_dir : ./output/rec_pp-OCRv2_distillation [2022/11/09 15:51:53] ppocr INFO: save_res_path : ./output/rec/predicts_pp-OCRv2_distillation.txt [2022/11/09 15:51:53] ppocr INFO: use_gpu : True [2022/11/09 15:51:53] ppocr INFO: use_space_char : True [2022/11/09 15:51:53] ppocr INFO: use_visualdl : False [2022/11/09 15:51:53] ppocr INFO: Loss : [2022/11/09 15:51:53] ppocr INFO: loss_config_list : [2022/11/09 15:51:53] ppocr INFO: DistillationCTCLoss : [2022/11/09 15:51:53] ppocr INFO: key : head_out [2022/11/09 15:51:53] ppocr INFO: model_name_list : ['Student', 'Teacher'] [2022/11/09 15:51:53] ppocr INFO: weight : 1.0 [2022/11/09 15:51:53] ppocr INFO: DistillationDMLLoss : [2022/11/09 15:51:53] ppocr INFO: act : softmax [2022/11/09 15:51:53] ppocr INFO: key : head_out [2022/11/09 15:51:53] ppocr INFO: model_name_pairs : [['Student', 'Teacher']] [2022/11/09 15:51:53] ppocr INFO: use_log : True [2022/11/09 15:51:53] ppocr INFO: weight : 1.0 [2022/11/09 15:51:53] ppocr INFO: DistillationDistanceLoss : [2022/11/09 15:51:53] ppocr INFO: key : backbone_out [2022/11/09 15:51:53] ppocr INFO: mode : l2 [2022/11/09 15:51:53] ppocr INFO: model_name_pairs : [['Student', 'Teacher']] [2022/11/09 15:51:53] ppocr INFO: weight : 1.0 [2022/11/09 15:51:53] ppocr INFO: name : CombinedLoss [2022/11/09 15:51:53] ppocr INFO: Metric : [2022/11/09 15:51:53] ppocr INFO: base_metric_name : RecMetric [2022/11/09 15:51:53] ppocr INFO: key : Student [2022/11/09 15:51:53] ppocr INFO: main_indicator : acc [2022/11/09 15:51:53] ppocr INFO: name : DistillationMetric [2022/11/09 15:51:53] ppocr INFO: Optimizer : [2022/11/09 15:51:53] ppocr INFO: beta1 : 0.9 [2022/11/09 15:51:53] ppocr INFO: beta2 : 0.999 [2022/11/09 15:51:53] ppocr INFO: lr : [2022/11/09 15:51:53] ppocr INFO: decay_epochs : [700, 800] [2022/11/09 15:51:53] ppocr INFO: name : Piecewise [2022/11/09 15:51:53] ppocr INFO: values : [0.0005, 0.0001] [2022/11/09 15:51:53] ppocr INFO: warmup_epoch : 5 [2022/11/09 15:51:53] ppocr INFO: name : Adam [2022/11/09 15:51:53] ppocr INFO: regularizer : [2022/11/09 15:51:53] ppocr INFO: factor : 2e-05 [2022/11/09 15:51:53] ppocr INFO: name : L2 [2022/11/09 15:51:53] ppocr INFO: PostProcess : [2022/11/09 15:51:53] ppocr INFO: key : head_out [2022/11/09 15:51:53] ppocr INFO: model_name : ['Student', 'Teacher'] [2022/11/09 15:51:53] ppocr INFO: name : DistillationCTCLabelDecode [2022/11/09 15:51:53] ppocr INFO: Train : [2022/11/09 15:51:53] ppocr INFO: dataset : [2022/11/09 15:51:53] ppocr INFO: data_dir : E:/license/train_data/rec [2022/11/09 15:51:53] ppocr INFO: label_file_list : ['E:/license/train_data/rec_gt_train.txt'] [2022/11/09 15:51:53] ppocr INFO: name : SimpleDataSet [2022/11/09 15:51:53] ppocr INFO: transforms : [2022/11/09 15:51:53] ppocr INFO: DecodeImage : [2022/11/09 15:51:53] ppocr INFO: channel_first : False [2022/11/09 15:51:53] ppocr INFO: img_mode : BGR [2022/11/09 15:51:53] ppocr INFO: RecAug : None [2022/11/09 15:51:53] ppocr INFO: CTCLabelEncode : None [2022/11/09 15:51:53] ppocr INFO: RecResizeImg : [2022/11/09 15:51:53] ppocr INFO: image_shape : [3, 32, 320] [2022/11/09 15:51:53] ppocr INFO: KeepKeys : [2022/11/09 15:51:53] ppocr INFO: keep_keys : ['image', 'label', 'length'] [2022/11/09 15:51:53] ppocr INFO: loader : [2022/11/09 15:51:53] ppocr INFO: batch_size_per_card : 64 [2022/11/09 15:51:53] ppocr INFO: drop_last : True [2022/11/09 15:51:53] ppocr INFO: num_sections : 1 [2022/11/09 15:51:53] ppocr INFO: num_workers : 1 [2022/11/09 15:51:53] ppocr INFO: shuffle : True [2022/11/09 15:51:53] ppocr INFO: profiler_options : None [2022/11/09 15:51:53] ppocr INFO: train with paddle 2.3.1 and device Place(gpu:0) [2022/11/09 15:51:53] ppocr INFO: Initialize indexs of datasets:['E:/license/train_data/rec_gt_train.txt'] [2022/11/09 15:51:53] ppocr INFO: Initialize indexs of datasets:['E:/license/train_data/rec_gt_test.txt'] image

llwowowowoll commented 2 years ago

我也发现这个问题了

littletomatodonkey commented 2 years ago

你好,你这个使用的配置文件看着是ppocrv3,不能使用ppocrv2的权重哈,我们也更新下finetune文档

littletomatodonkey commented 2 years ago

你好,可以参考这个pr里的finetune文档看下~: https://github.com/PaddlePaddle/PaddleOCR/pull/8302/files

jackliu1111 commented 2 years ago

你好,可以参考这个pr里的finetune文档看下~: https://github.com/PaddlePaddle/PaddleOCR/pull/8302/files

已经更改为ppocrv3,不过准确率还是从0开始的。 我在微调时在字典里增加了生僻字,是否由于输入和输出的onehot编码改变而导致模型需要从0开始呢? 请问如果这样的话,对预训练模型fineturn是否也能起到加快收敛的效果。

tink2123 commented 1 year ago

微调时如果修改字典,在迭代前期准确率是从0开始的,使用预训练模型可以加快模型收敛。

Tavish77 commented 1 year ago

我使用了自己生成的一些数据去做finetune,生成了不同字体,背景的手写字符图片,训练时ACC始终为0,是模型无法拟合这类数据吗

ericosmic commented 1 year ago

我在微调模型时acc也是从0开始逐渐上升,但是字典并没有修改,加载预训练模型的情况下acc难道不是应该从一个高值开始吗?

ericosmic commented 1 year ago

另外我使用v3版本模型的distillation训练配置进行训练,同样的数据集,v2的模型acc最后可以到达96%, 而v3的acc只能到达75%就上不去了。

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.