PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
43.55k stars 7.77k forks source link

Low Validation Accuracy for PPOCRv4 if Image resolution is changed #13252

Closed ManikSinghSarmaal closed 3 months ago

ManikSinghSarmaal commented 3 months ago

问题描述 / Problem Description

I've been going through a problem since a month where (Specially for PPOCRv4) if I change the Image Resolution to [3,32,150] instead of default [3,48,320]. Something is off because PPOCRv4 introduces MultiScaleDataset for training in config and SimpleDataset for evaluation, whereas it was only SimpleDataset for both train and eval in config of PPOCRv3. The main problem is if you change Image resolution as what I said above to [3,32,150] in config of v4, the train accuracy is high around 95% while evaluation on that gives very poor results and this is not even a case of overfitting as I tried training on train+eval data which gave in logs around 98% accuracy for training while evaluation on eval dataset is in 40%, I even tried changing eval and train dataloaders to both MultiScaleDataset and SimpleDataset but it didn't help, something is there I cannot understand as how training on some data shows 98% accuracy and above and evaluation on the same small part of that data gives accuracies in 40%. Same happens for train data, during training logs it shows accuracy of 98% and when i evaluate same training data on best_accuracy.pdparams it gives off accuracy 58% which was for the evaluation, what could be the reason for this discrepancy ?

运行环境 / Runtime Environment

复现代码 / Reproduction Code

My config is - `Global: debug: true use_gpu: true epoch_num: 500

log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/finally_v4 save_epoch_step: 100 eval_batch_step:

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.0005 warmup_epoch: 5 regularizer: name: L2 factor: 3.0e-05 Architecture: model_type: rec algorithm: SVTR_LCNet Transform: null Backbone: name: PPLCNetV3 scale: 0.95 Head: name: MultiHead head_list:

my train data is T1+V1 and my eval data is V1

完整报错 / Complete Error Message

`ppocr INFO: epoch: [199/500], global_step: 71970, lr: 0.000337, acc: 0.950521, norm_edit_dis: 0.993425, CTCLoss: 0.303188, NRTRLoss: 0.711274, loss: 1.013391, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.78093 s, avg_samples: 84.8, ips: 108.58877 samples/s, eta: 1 day, 4:10:31, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:06] ppocr INFO: epoch: [199/500], global_step: 71980, lr: 0.000337, acc: 0.953125, norm_edit_dis: 0.993425, CTCLoss: 0.268923, NRTRLoss: 0.712596, loss: 0.986175, avg_reader_cost: 0.00024 s, avg_batch_cost: 0.79516 s, avg_samples: 64.0, ips: 80.48709 samples/s, eta: 1 day, 4:10:19, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:14] ppocr INFO: epoch: [199/500], global_step: 71990, lr: 0.000337, acc: 0.958333, norm_edit_dis: 0.992793, CTCLoss: 0.238311, NRTRLoss: 0.712596, loss: 0.950907, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.79268 s, avg_samples: 75.2, ips: 94.86814 samples/s, eta: 1 day, 4:10:08, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:22] ppocr INFO: epoch: [199/500], global_step: 72000, lr: 0.000337, acc: 0.942708, norm_edit_dis: 0.992840, CTCLoss: 0.244233, NRTRLoss: 0.708435, loss: 0.952667, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.77737 s, avg_samples: 78.4, ips: 100.85287 samples/s, eta: 1 day, 4:09:57, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB

eval model:: 0%| | 0/34 [00:00<?, ?it/s] eval model:: 3%|▎ | 1/34 [00:00<00:10, 3.18it/s] eval model:: 9%|▉ | 3/34 [00:00<00:03, 8.38it/s] eval model:: 15%|█▍ | 5/34 [00:00<00:02, 11.93it/s] eval model:: 21%|██ | 7/34 [00:00<00:01, 14.33it/s] eval model:: 26%|██▋ | 9/34 [00:00<00:01, 16.04it/s] eval model:: 35%|███▌ | 12/34 [00:00<00:01, 17.63it/s] eval model:: 44%|████▍ | 15/34 [00:01<00:01, 18.51it/s] eval model:: 53%|█████▎ | 18/34 [00:01<00:00, 19.07it/s] eval model:: 62%|██████▏ | 21/34 [00:01<00:00, 19.44it/s] eval model:: 71%|███████ | 24/34 [00:01<00:00, 19.67it/s] eval model:: 76%|███████▋ | 26/34 [00:01<00:00, 19.24it/s] eval model:: 82%|████████▏ | 28/34 [00:01<00:00, 18.40it/s] eval model:: 88%|████████▊ | 30/34 [00:01<00:00, 12.62it/s] eval model:: 94%|█████████▍| 32/34 [00:02<00:00, 13.43it/s] eval model:: 100%|██████████| 34/34 [00:02<00:00, 13.86it/s] eval model:: 100%|██████████| 34/34 [00:04<00:00, 8.21it/s] [2024/07/03 05:54:26] ppocr INFO: cur metric, acc: 0.48791821447974393, norm_edit_dis: 0.9153030167021731, fps: 2919.5675084361587 [2024/07/03 05:54:26] ppocr INFO: best metric, acc: 0.5864312254032732, is_float16: False, norm_edit_dis: 0.9353985370995163, fps: 2784.2707554228186, best_epoch: 188 [2024/07/03 05:54:34] ppocr INFO: epoch: [199/500], global_step: 72010, lr: 0.000337, acc: 0.955729, norm_edit_dis: 0.994558, CTCLoss: 0.214817, NRTRLoss: 0.706834, loss: 0.922900, avg_reader_cost: 0.00024 s, avg_batch_cost: 0.78858 s, avg_samples: 72.0, ips: 91.30309 samples/s, eta: 1 day, 4:09:45, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:42] ppocr INFO: epoch: [199/500], global_step: 72020, lr: 0.000337, acc: 0.953125, norm_edit_dis: 0.993750, CTCLoss: 0.224658, NRTRLoss: 0.709030, loss: 0.934898, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.78265 s, avg_samples: 67.2, ips: 85.86192 samples/s, eta: 1 day, 4:09:34, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB `

可能解决方案 / Possible solutions

Please help me around this issue on this repo

附件 / Appendix

Topdu commented 3 months ago

MultiScaleDataset is a training strategy used to improve accuracy, it cannot be used during evaluation, please refer to PP-OCRv4 configuration file to set the parameters of Eval dataset.

ManikSinghSarmaal commented 3 months ago

Thanks for helping it out but i figured it out that in the backbone used in ppocrv4 i.e. rec_lcnetv3.py in the forward pass you have used adaptive_avg_pool2D to a fixed size in training mode and avgpool_2D in evaluation mode. if self.training: x = F.adaptive_avg_pool2d(x, [1, 40]) else: x = F.avg_pool2d(x, [3, 2]) return x This works well with image size 48,320[height,width] as image shape in both cases i.e. training and evaluation, adaptive and normal pooling gives same size coincidently but if you change image resolution to (32,150), shapes at the end will be different in training and evaluation mode, training will give [1,40] fixed size defined by adaptive avg pool but evaluation uses normal avgpool_2D which gives size [1,19], this was the main reason why I was getting lower accuracy on changing resolution from the default 48,320. I still don't know if this was intentional or what was the cause to remove adaptive avg pool in evaluation mode ?? Kindly elaborate on this

xiaomaxiao commented 1 month ago

Because in the training stage, MutilScaleSampler uses three different heights of 32, 48, and 64, adaptive_avg_pool2d(x, [1, 40])is used to deal with different heights, but the '40' is fixed even you change image width to 150. So in evaluation mode use F.avg_pool2d(x, [3, 2]) to deal with different image width. In the training stage, if you want use others width eg. 150 , 640 ...

  1. use x = F.avg_pool2d(x, [3, 2]) both train and evaluation
  2. del MutilScaleSampler , just use 48