Open aspaul20 opened 1 month ago
It is suggested to try fine-tuning the model instead of retraining it.:https://paddlepaddle.github.io/PaddleOCR/en/ppocr/model_train/finetune.html
@jingsongliujing I believe I am finetuning the model. See config line:
pretrained_model: weights/ch/ch_PP-OCRv4_rec_train/student
🔎 Search before asking
🐛 Bug (问题描述)
I am trying to train the ch_PP-OCRv4_rec.yml recognition model on a custom dataset, which has very long images, up to 2480 pixels and ~50 height. There are also images with a lot of letters in them that I need to recognize, up to 135 chars per string. When training with the default configuration and max_text_length set to 140, I get an accuracy of 50% and the model never improves any further.
Upon some research, I saw that when max_text_length is increased, I should also increase the image width so that the image doesn't become too blurry. I also turned off any additional transformations like RecAug and RecConAug because they were warping the images and making them unreadable. But the performance does not improve. My config now is as follows:
Current epoch step (which won't improve much regardless of how many epochs):
[2024/10/02 12:12:08] ppocr INFO: epoch: [31/250], global_step: 830, lr: 0.000975, acc: 0.416666, norm_edit_dis: 0.437640, CTCLoss: 0.020622, NRTRLoss: 1.209766, loss: 1.234407, avg_reader_cost: 0.00011 s, avg_batch_cost: 0.47500 s, avg_samples: 18.0, ips: 37.89484 samples/s, eta: 0:50:12, max_mem_reserved: 7172 MB, max_mem_allocated: 6961 MB
About the dataset: I have about ~450 original images, each of which I have augmented 100x to get ~45,000 samples. Results don't change whether I use the augmented dataset or the original dataset. Even overfitting on this train data would be satisfactory but the model does not get there either. Please help
🏃♂️ Environment (运行环境)
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
python3 tools/train.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec.yml
I cannot share the dataset.