PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.1k stars 7.81k forks source link

best metric, acc: 0.0 on recognition #8075

Closed Yosiiiiiiiiiiiiiiii closed 1 year ago

Yosiiiiiiiiiiiiiiii commented 2 years ago

The training seem to be ok. The final epoch acc: 0.988281 , but the best metric, acc: 0.0 What's wrong with that? I used my custom dataset and I adjust dict.txt

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: learning_rate: 0.001 regularizer: name: 'L2' factor: 0.00001

Architecture: model_type: rec algorithm: CRNN Transform: Backbone: name: MobileNetV3 scale: 0.5 model_name: large Neck: name: SequenceEncoder encoder_type: rnn hidden_size: 96 Head: name: CTCHead fc_decay: 0.00001

Loss: name: CTCLoss

PostProcess: name: CTCLabelDecode

Metric: name: RecMetric main_indicator: acc

Train: dataset: name: SimpleDataSet data_dir: ./train_data/custom_dataset/train/ label_file_list: ["./train_data/custom_dataset/rec_gt_train.txt"] transforms:

Eval: dataset: name: SimpleDataSet data_dir: ./train_data/custom_dataset/test label_file_list: ["./train_data/custom_dataset/rec_gt_test.txt"] transforms:

and I ran this Screen Shot 2565-10-24 at 16 57 45

and I got this result Screen Shot 2565-10-24 at 16 58 45

drenched9 commented 2 years ago

is the dict.txt you adjust the same as the txt in your command with "Global.character_dict_path=ppocr/utils/ic15_dict.txt"?

Yosiiiiiiiiiiiiiiii commented 2 years ago

@drenched9 I add ic15_dict.txt with more english character.

bely66 commented 2 years ago

@Yosiiiiiiiiiiiiiiii Hi man, I have a question did you disable RecConAug on purpose? and what was the training data size you used for training? I can see the training accuracy of your model doing pretty well. I'm training my model on a large 9M dataset which usually gets high accuracy in Training using Models like vanilla CRNN or SAR (above 95%) but when using PPOCRv3 the accuracy drops to 83%.

Yosiiiiiiiiiiiiiiii commented 2 years ago

hi @bely66

"RecConAug on purpose?" >> I didn't do anything. I was training on CRNN, following rec_icdar15_train.yml "I can see the training accuracy of your model doing pretty well." >> no it was not >> it is overfitting so i add 64k sample on training set and the accuracy was above 98%. I can't make it train on PPOCRV3. How did you do that? I posted my issue here: https://github.com/PaddlePaddle/PaddleOCR/issues/8178 please help if you can ^^

tink2123 commented 1 year ago

@Yosiiiiiiiiiiiiiiii The model is seriously overfitting. It is recommended to try to load the pre-trained model and reduce the learning rate( Try reducing it to 0.0001)