Closed asif-ca closed 7 months ago
you can use pdb to see where the program ends. As the info show, the model has been loaded.
@shiyutang Really thanks for your reply
I fixed the issue by setting the epoch_num in yml
file greater than the previous epoch_num
for which the model was trained Actually it was quitting the loop because I was trying to resume training by changing data for 10 epochs because the fine-tuned model has white space issue it was unable to detect white space issue.
Any direction @shiyutang is really appreciated for the white space issue below is what I tried from docs
I added 50,000 images of white space like multiple words in the image (as suggested here) in the dataset of 90 thousand images for recognizer and resumed training for 20 epochs but still, I have 0 improvements in detecting white spaces in recognition
As per my understanding of the white space issue from here I need to add more images with white space in the dataset so I did this way
And labels file like this:
000000035.jpg chipset natal
000000038.jpg acdbline usable
000000025.jpg csa offenses
But nothing improved, the pre-trained model is able to detect white spaces (even though that missing the spaces sometimes but still able to detect them) but the fine-tuned model is really unable to detect white spaces in text
Am I doing something wrong in the dataset as per described here?
Please suggest!
I'm facing the same issue. Did you find a way to overcome the problem with the white spaces after fine-tuning?
@danteblink, it is crucial to add more data with white spaces to improve recognition accuracy. Please ensure that you add maximum data with white spaces to achieve the best results. For further guidance, please refer to this article.
Initially, I fine-tuned the model on nearly half a million images for 50 epochs. However, most of the images only contained single words, and I only added 30000 images with two words containing white spaces, which was not a correct ratio. As a result, the model was unable to detect white spaces.
After that, I fine-tuned the same base model with synthesized words that included white spaces for almost 100 thousand images(total) almost 60-70 thousand images had white spaces. This time, I observed that the model was able to detect white spaces. Currently, I am working on collecting real data to train the model further.
Also, try det_db_unclip_ratio
for some higher values like 2, 3, etc.
custom_ocr = PaddleOCR(use_angle_cls=True,
rec_model_dir='/content/rec_trained_model',
det_db_unclip_ratio=2.9,
)
@asif-ca Thank you for your response. I have improved the white space detection.
Was Able to resume traing, explanation given here
请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
系统环境/System Environment:
20.04.6 LTS
版本号/Version:Paddle:
2.5.1
PaddleOCR:2.5 问题相关组件/Related components:运行指令/Command Code:
python3 tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.checkpoints=output/rec_new_train/latest
完整报错/Complete Error Message:[2023/10/16 11:30:45] ppocr INFO: resume from output/rec_new_train/latest [2023/10/16 11:30:45] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 10000 iterations [2023/10/16 11:30:45] ppocr INFO: best metric, acc: 0.9489788677900999, is_float16: False, norm_edit_dis: 0.9697556926937365, Teacher_acc: 0.9486444026809259, Teacher_norm_edit_dis: 0.9702948101480247, fps: 577.635715075861, best_epoch: 47, start_epoch: 51
@andyjpaddle @ZeyuChen @haobibo @bingooo @shiyutang @Evezerest please have a look
我们提供了AceIssueSolver来帮助你解答问题,你是否想要它来解答(请填写yes/no)?/We provide AceIssueSolver to solve issues, do you want it? (Please write yes/no): Yes