PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.25k stars 7.82k forks source link

训练轮数越多实际效果越差,第1、2轮效果好,以后越来越差 #5510

Closed dengmingD closed 2 years ago

dengmingD commented 2 years ago

训练轮数越多实际效果越差,第1、2轮效果好,以后越来越差 第二次的预测结果: [2022/02/21 11:47:09] root DEBUG: 0 Predict time of d:/888.jpg: 3.547s [2022/02/21 11:47:09] root DEBUG: 湖南远哥食品有限公司, 0.974 [2022/02/21 11:47:09] root DEBUG: 湖南远哥食品有限公司食品有限公司湖南远哥食品有限, 0.990 [2022/02/21 11:47:09] root DEBUG: 销, 0.999 [2022/02/21 11:47:09] root DEBUG: 售, 0.998 [2022/02/21 11:47:09] root DEBUG: 合, 0.996 [2022/02/21 11:47:09] root DEBUG: 同, 0.995 [2022/02/21 11:47:09] root DEBUG: 书, 1.000 [2022/02/21 11:47:09] root DEBUG: 公司总部:湖南湘潭市雨湖区姜畲镇梅花工业园2期6栋, 0.965 [2022/02/21 11:47:09] root DEBUG: 公司总部:湖南湘潭市雨湖区姜畲镇梅花工业园2期6栋, 0.977 [2022/02/21 11:47:09] root DEBUG: 公司总部:湖南湘潭市雨湖区姜畲镇梅花工业园2期6栋, 0.979 [2022/02/21 11:47:09] root DEBUG: 招商电话:400-777-073215675277888, 0.988 [2022/02/21 11:47:09] root DEBUG: 第1页共10页, 0.997

第179次的结果: [2022/02/21 11:36:57] root DEBUG: 0 Predict time of d:/888.jpg: 3.450s [2022/02/21 11:36:57] root DEBUG: 督, 0.501 [2022/02/21 11:36:57] root DEBUG: 闻, 0.581 [2022/02/21 11:36:57] root DEBUG: 书, 0.949 [2022/02/21 11:36:57] root DEBUG: 公司总部:胡帝缩谭节丽溯区姜御镇梅花工业园26相, 0.749 [2022/02/21 11:36:57] root DEBUG: 公司总彰:湖南羁潭市雨湖区姜奇篡梅秘工业z期飞, 0.596 [2022/02/21 11:36:57] root DEBUG: 公司总颜:湖函潭市雨湖区桑裕篡秘工业愿2期飞, 0.635 [2022/02/21 11:36:57] root DEBUG: 暴窗电:qwvoo2.1sss2www, 0.589

第179的acc明显比第2次的差

config文件如下: Global: debug: false use_gpu: true epoch_num: 1000 log_smooth_window: 200 print_batch_step: 100 save_model_dir: ./output/rec_pp-OCRv2_distillation_6 save_epoch_step: 1 eval_batch_step: [0, 10000] cal_metric_during_train: true pretrained_model: ./train_model/ch_PP-OCRv2_rec_train/best_accuracy checkpoints: save_inference_dir: use_visualdl: false infer_img: doc/imgs_words/ch/word_1.jpg character_dict_path: ppocr/utils/ppocr_keys_v1.txt character_type: ch max_text_length: 15 infer_mode: false use_space_char: true distributed: true save_res_path: ./output/rec/predicts_pp-OCRv2_distillation.txt

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Piecewise decay_epochs : [700, 800] values : [0.001, 0.0001] warmup_epoch: 5 regularizer: name: L2 factor: 2.0e-05

Architecture: model_type: &model_type "rec" name: DistillationModel algorithm: Distillation Models: Teacher: pretrained: freeze_params: false return_all_feats: true model_type: model_type algorithm: CRNN Transform: Backbone: name: MobileNetV1Enhance scale: 0.5 Neck: name: SequenceEncoder encoder_type: rnn hidden_size: 64 Head: name: CTCHead mid_channels: 96 fc_decay: 0.00002 Student: pretrained: freeze_params: false return_all_feats: true model_type: model_type algorithm: CRNN Transform: Backbone: name: MobileNetV1Enhance scale: 0.5 Neck: name: SequenceEncoder encoder_type: rnn hidden_size: 64 Head: name: CTCHead mid_channels: 96 fc_decay: 0.00002

Loss: name: CombinedLoss loss_config_list:

PostProcess: name: DistillationCTCLabelDecode model_name: ["Student", "Teacher"] key: head_out

Metric: name: DistillationMetric base_metric_name: RecMetric main_indicator: acc key: "Student"

Train: dataset: name: SimpleDataSet data_dir: D:/paddle21/yu_liao_ku/train_data_6/ label_file_list:

tink2123 commented 2 years ago

是不是训练数据太少的缘故呢,看起来模型过拟合了。建议增加训练数据,或者降低学习率,减少总的迭代次数

dengmingD commented 2 years ago

不是数量太少,500万数量,我调一下学习率试下

littletomatodonkey commented 2 years ago

你好,学习率和batch size需要成比例变化

paddle-bot-old[bot] commented 2 years ago

Since you haven\'t replied for more than 3 months, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于您超过三个月未回复,我们将关闭这个issue/pr。 若问题未解决或有后续问题,请随时重新打开(建议先拉取最新代码进行尝试),我们会继续跟进。