Closed ManikSinghSarmaal closed 3 months ago
MultiScaleDataset is a training strategy used to improve accuracy, it cannot be used during evaluation, please refer to PP-OCRv4 configuration file to set the parameters of Eval dataset.
Thanks for helping it out but i figured it out that in the backbone used in ppocrv4 i.e. rec_lcnetv3.py in the forward pass you have used adaptive_avg_pool2D to a fixed size in training mode and avgpool_2D in evaluation mode.
if self.training: x = F.adaptive_avg_pool2d(x, [1, 40]) else: x = F.avg_pool2d(x, [3, 2]) return x
This works well with image size 48,320[height,width] as image shape in both cases i.e. training and evaluation, adaptive and normal pooling gives same size coincidently but if you change image resolution to (32,150), shapes at the end will be different in training and evaluation mode, training will give [1,40] fixed size defined by adaptive avg pool but evaluation uses normal avgpool_2D which gives size [1,19], this was the main reason why I was getting lower accuracy on changing resolution from the default 48,320.
I still don't know if this was intentional or what was the cause to remove adaptive avg pool in evaluation mode ?? Kindly elaborate on this
Because in the training stage, MutilScaleSampler uses three different heights of 32, 48, and 64, adaptive_avg_pool2d(x, [1, 40])is used to deal with different heights, but the '40' is fixed even you change image width to 150. So in evaluation mode use F.avg_pool2d(x, [3, 2]) to deal with different image width. In the training stage, if you want use others width eg. 150 , 640 ...
问题描述 / Problem Description
I've been going through a problem since a month where (Specially for PPOCRv4) if I change the Image Resolution to [3,32,150] instead of default [3,48,320]. Something is off because PPOCRv4 introduces MultiScaleDataset for training in config and SimpleDataset for evaluation, whereas it was only SimpleDataset for both train and eval in config of PPOCRv3. The main problem is if you change Image resolution as what I said above to [3,32,150] in config of v4, the train accuracy is high around 95% while evaluation on that gives very poor results and this is not even a case of overfitting as I tried training on train+eval data which gave in logs around 98% accuracy for training while evaluation on eval dataset is in 40%, I even tried changing eval and train dataloaders to both MultiScaleDataset and SimpleDataset but it didn't help, something is there I cannot understand as how training on some data shows 98% accuracy and above and evaluation on the same small part of that data gives accuracies in 40%. Same happens for train data, during training logs it shows accuracy of 98% and when i evaluate same training data on best_accuracy.pdparams it gives off accuracy 58% which was for the evaluation, what could be the reason for this discrepancy ?
运行环境 / Runtime Environment
复现代码 / Reproduction Code
My config is - `Global: debug: true use_gpu: true epoch_num: 500
log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/finally_v4 save_epoch_step: 100 eval_batch_step:
Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.0005 warmup_epoch: 5 regularizer: name: L2 factor: 3.0e-05 Architecture: model_type: rec algorithm: SVTR_LCNet Transform: null Backbone: name: PPLCNetV3 scale: 0.95 Head: name: MultiHead head_list:
my train data is T1+V1 and my eval data is V1
完整报错 / Complete Error Message
`ppocr INFO: epoch: [199/500], global_step: 71970, lr: 0.000337, acc: 0.950521, norm_edit_dis: 0.993425, CTCLoss: 0.303188, NRTRLoss: 0.711274, loss: 1.013391, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.78093 s, avg_samples: 84.8, ips: 108.58877 samples/s, eta: 1 day, 4:10:31, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:06] ppocr INFO: epoch: [199/500], global_step: 71980, lr: 0.000337, acc: 0.953125, norm_edit_dis: 0.993425, CTCLoss: 0.268923, NRTRLoss: 0.712596, loss: 0.986175, avg_reader_cost: 0.00024 s, avg_batch_cost: 0.79516 s, avg_samples: 64.0, ips: 80.48709 samples/s, eta: 1 day, 4:10:19, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:14] ppocr INFO: epoch: [199/500], global_step: 71990, lr: 0.000337, acc: 0.958333, norm_edit_dis: 0.992793, CTCLoss: 0.238311, NRTRLoss: 0.712596, loss: 0.950907, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.79268 s, avg_samples: 75.2, ips: 94.86814 samples/s, eta: 1 day, 4:10:08, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:22] ppocr INFO: epoch: [199/500], global_step: 72000, lr: 0.000337, acc: 0.942708, norm_edit_dis: 0.992840, CTCLoss: 0.244233, NRTRLoss: 0.708435, loss: 0.952667, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.77737 s, avg_samples: 78.4, ips: 100.85287 samples/s, eta: 1 day, 4:09:57, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB
eval model:: 0%| | 0/34 [00:00<?, ?it/s] eval model:: 3%|▎ | 1/34 [00:00<00:10, 3.18it/s] eval model:: 9%|▉ | 3/34 [00:00<00:03, 8.38it/s] eval model:: 15%|█▍ | 5/34 [00:00<00:02, 11.93it/s] eval model:: 21%|██ | 7/34 [00:00<00:01, 14.33it/s] eval model:: 26%|██▋ | 9/34 [00:00<00:01, 16.04it/s] eval model:: 35%|███▌ | 12/34 [00:00<00:01, 17.63it/s] eval model:: 44%|████▍ | 15/34 [00:01<00:01, 18.51it/s] eval model:: 53%|█████▎ | 18/34 [00:01<00:00, 19.07it/s] eval model:: 62%|██████▏ | 21/34 [00:01<00:00, 19.44it/s] eval model:: 71%|███████ | 24/34 [00:01<00:00, 19.67it/s] eval model:: 76%|███████▋ | 26/34 [00:01<00:00, 19.24it/s] eval model:: 82%|████████▏ | 28/34 [00:01<00:00, 18.40it/s] eval model:: 88%|████████▊ | 30/34 [00:01<00:00, 12.62it/s] eval model:: 94%|█████████▍| 32/34 [00:02<00:00, 13.43it/s] eval model:: 100%|██████████| 34/34 [00:02<00:00, 13.86it/s] eval model:: 100%|██████████| 34/34 [00:04<00:00, 8.21it/s] [2024/07/03 05:54:26] ppocr INFO: cur metric, acc: 0.48791821447974393, norm_edit_dis: 0.9153030167021731, fps: 2919.5675084361587 [2024/07/03 05:54:26] ppocr INFO: best metric, acc: 0.5864312254032732, is_float16: False, norm_edit_dis: 0.9353985370995163, fps: 2784.2707554228186, best_epoch: 188 [2024/07/03 05:54:34] ppocr INFO: epoch: [199/500], global_step: 72010, lr: 0.000337, acc: 0.955729, norm_edit_dis: 0.994558, CTCLoss: 0.214817, NRTRLoss: 0.706834, loss: 0.922900, avg_reader_cost: 0.00024 s, avg_batch_cost: 0.78858 s, avg_samples: 72.0, ips: 91.30309 samples/s, eta: 1 day, 4:09:45, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:42] ppocr INFO: epoch: [199/500], global_step: 72020, lr: 0.000337, acc: 0.953125, norm_edit_dis: 0.993750, CTCLoss: 0.224658, NRTRLoss: 0.709030, loss: 0.934898, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.78265 s, avg_samples: 67.2, ips: 85.86192 samples/s, eta: 1 day, 4:09:34, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB `
可能解决方案 / Possible solutions
Please help me around this issue on this repo
附件 / Appendix