PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.67k stars 7.86k forks source link

PaddleOCR api call results on inference model don't make any sense #14286

Open Pedro69491 opened 3 days ago

Pedro69491 commented 3 days ago

🔎 Search before asking

🐛 Bug (问题描述)

I have been using PaddleOCR training capabilities in a small dataset of digits, after 25 epochs the accuracy of the model reaches 100%, I then evaluate the model and I get an accuracy of 100% too. The problem is that when I try to test the model on the exact same images I used to eval, I get completely different results, of course Iexported the best weights from the trained recognition model

from PaddleOCR.paddleocr import PaddleOCR

ocr = PaddleOCR( use_gpu=False, rec_char_dict_path='./digit_dict.txt', rec_model_dir="./PaddleOCR/inference/rec_digits", # Path to the saved model )

result = ocr.ocr(image_path)

output: chinese symbol instead of a number

notes: digit_dict refers to a small text file containing the numbers from 0 to 9

I tried to use infer_rec.py but the results were no good at all again with data that we already used to validate, not sure what I should do next.

🏃‍♂️ Environment (运行环境)

linux: ubuntu 22.04 python3: 10.12

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

python3 tools/train.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.use_gpu=False

python3 tools/eval.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.checkpoints=./output/v3_en_mobile/iter_epoch_24.pdparams Global.use_gpu=False

python3 tools/export_model.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.checkpoints=./output/v3_en_mobile/iter_epoch_24.pdparams Global.save_inference_dir=./inference/rec_digits

GreatV commented 3 days ago

The issue you're facing, where the PaddleOCR inference model produces incorrect predictions (e.g., Chinese symbols instead of digits) despite correct training and evaluation results, is a common problem that arises in OCR workflows due to mismatched configurations or issues during model export and inference. Below are the likely causes and steps to resolve the issue:


1. Mismatch in rec_char_dict_path

Example:

0
1
2
3
4
5
6
7
8
9

If there is any discrepancy in the character dictionary, the inference model may produce unexpected characters, such as Chinese symbols, because it interprets the output indices incorrectly.


2. Incorrect Model Export

When exporting the trained model to an inference model, ensure that the correct checkpoints and configurations are used. Verify that the following steps were executed correctly:

Command you ran:

python3 tools/export_model.py \
  -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml \
  -o Global.checkpoints=./output/v3_en_mobile/iter_epoch_24.pdparams \
     Global.save_inference_dir=./inference/rec_digits

Verify the above command and ensure the paths are correct. If there was an issue during export, the inference model may not align with the trained weights.


3. Inference Configuration Issues

The issue may lie in the parameters passed to the PaddleOCR class during inference. Based on your code:

ocr = PaddleOCR(
    use_gpu=False,
    rec_char_dict_path='./digit_dict.txt',
    rec_model_dir="./PaddleOCR/inference/rec_digits"
)

Check the following:

If the configuration is incorrect, the model may fail to interpret its predictions properly.


4. Model Evaluation vs. Inference

Even though the model achieves 100% accuracy during evaluation, this does not guarantee it will work perfectly during inference. The evaluation step uses the training pipeline, which differs slightly from the inference pipeline in terms of preprocessing and postprocessing. Here are some steps to debug:

Compare the results from infer_rec.py with those from the PaddleOCR API. If the results from infer_rec.py are correct, the issue may lie in how the PaddleOCR class is configured.


5. Preprocessing Differences

Differences in preprocessing between training and inference can also cause discrepancies. For example:

Verify the rec_image_shape in your config file (e.g., ["3", "32", "100"] for recognition) and ensure it matches the preprocessing applied in your inference script.


6. Debugging Steps

  1. Check Model Export Logs: Ensure no warnings or errors occurred during model export.
  2. Verify Character Set: Use the same digit_dict.txt file across training, evaluation, export, and inference.
  3. Test with infer_rec.py: Run the exported model using the infer_rec.py script and compare results.
  4. Inspect Preprocessing: Verify that the preprocessing pipeline used during inference matches the training pipeline.

References to Related Issues


Conclusion

To resolve your issue:

  1. Verify the digit_dict.txt file and ensure it is correctly referenced during all steps.
  2. Re-export the model using consistent configurations and checkpoints.
  3. Test the model with infer_rec.py to isolate whether the issue lies in the export process or the PaddleOCR API configuration.
  4. Align preprocessing between training and inference.

If the problem persists, share the outputs of infer_rec.py and your configuration file for further debugging.


Response generated by 🤖 feifei-bot | chatgpt-4o-latest

Pedro69491 commented 3 days ago

python3 tools/infer_rec.py \ -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml \ -o Global.infer_rec_model_dir="./inference/rec_digits" \ Global.infer_img="../digits/valid/0/7616.png" \ Global.use_gpu=False [2024/11/27 15:58:33] ppocr INFO: Architecture : [2024/11/27 15:58:33] ppocr INFO: Backbone : [2024/11/27 15:58:33] ppocr INFO: last_conv_stride : [1, 2] [2024/11/27 15:58:33] ppocr INFO: last_pool_kernel_size : [2, 2] [2024/11/27 15:58:33] ppocr INFO: last_pool_type : avg [2024/11/27 15:58:33] ppocr INFO: name : MobileNetV1Enhance [2024/11/27 15:58:33] ppocr INFO: scale : 0.5 [2024/11/27 15:58:33] ppocr INFO: Head : [2024/11/27 15:58:33] ppocr INFO: head_list : [2024/11/27 15:58:33] ppocr INFO: CTCHead : [2024/11/27 15:58:33] ppocr INFO: Head : [2024/11/27 15:58:33] ppocr INFO: fc_decay : 1e-05 [2024/11/27 15:58:33] ppocr INFO: Neck : [2024/11/27 15:58:33] ppocr INFO: depth : 2 [2024/11/27 15:58:33] ppocr INFO: dims : 64 [2024/11/27 15:58:33] ppocr INFO: hidden_dims : 120 [2024/11/27 15:58:33] ppocr INFO: name : svtr [2024/11/27 15:58:33] ppocr INFO: use_guide : True [2024/11/27 15:58:33] ppocr INFO: SARHead : [2024/11/27 15:58:33] ppocr INFO: enc_dim : 512 [2024/11/27 15:58:33] ppocr INFO: max_text_length : 25 [2024/11/27 15:58:33] ppocr INFO: name : MultiHead [2024/11/27 15:58:33] ppocr INFO: Transform : None [2024/11/27 15:58:33] ppocr INFO: algorithm : SVTR_LCNet [2024/11/27 15:58:33] ppocr INFO: model_type : rec [2024/11/27 15:58:33] ppocr INFO: Eval : [2024/11/27 15:58:33] ppocr INFO: dataset : [2024/11/27 15:58:33] ppocr INFO: data_dir : ../digits/valid [2024/11/27 15:58:33] ppocr INFO: label_file_list : ['../digits/labels/valid.txt'] [2024/11/27 15:58:33] ppocr INFO: name : SimpleDataSet [2024/11/27 15:58:33] ppocr INFO: transforms : [2024/11/27 15:58:33] ppocr INFO: DecodeImage : [2024/11/27 15:58:33] ppocr INFO: channel_first : False [2024/11/27 15:58:33] ppocr INFO: img_mode : BGR [2024/11/27 15:58:33] ppocr INFO: MultiLabelEncode : None [2024/11/27 15:58:33] ppocr INFO: RecResizeImg : [2024/11/27 15:58:33] ppocr INFO: image_shape : [3, 48, 320] [2024/11/27 15:58:33] ppocr INFO: KeepKeys : [2024/11/27 15:58:33] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio'] [2024/11/27 15:58:33] ppocr INFO: loader : [2024/11/27 15:58:33] ppocr INFO: batch_size_per_card : 128 [2024/11/27 15:58:33] ppocr INFO: drop_last : False [2024/11/27 15:58:33] ppocr INFO: num_workers : 4 [2024/11/27 15:58:33] ppocr INFO: shuffle : False [2024/11/27 15:58:33] ppocr INFO: Global : [2024/11/27 15:58:33] ppocr INFO: cal_metric_during_train : True [2024/11/27 15:58:33] ppocr INFO: character_dict_path : ../digit_dict.txt [2024/11/27 15:58:33] ppocr INFO: checkpoints : None [2024/11/27 15:58:33] ppocr INFO: debug : False [2024/11/27 15:58:33] ppocr INFO: distributed : False [2024/11/27 15:58:33] ppocr INFO: epoch_num : 25 [2024/11/27 15:58:33] ppocr INFO: eval_batch_step : [0, 2000] [2024/11/27 15:58:33] ppocr INFO: infer_img : ../digits/valid/0/7616.png [2024/11/27 15:58:33] ppocr INFO: infer_mode : False [2024/11/27 15:58:33] ppocr INFO: infer_rec_model_dir : ./inference/rec_digits [2024/11/27 15:58:33] ppocr INFO: log_smooth_window : 20 [2024/11/27 15:58:33] ppocr INFO: max_text_length : 25 [2024/11/27 15:58:33] ppocr INFO: pretrained_model : None [2024/11/27 15:58:33] ppocr INFO: print_batch_step : 10 [2024/11/27 15:58:33] ppocr INFO: save_epoch_step : 3 [2024/11/27 15:58:33] ppocr INFO: save_inference_dir : None [2024/11/27 15:58:33] ppocr INFO: save_model_dir : ./output/v3_en_mobile [2024/11/27 15:58:33] ppocr INFO: save_res_path : ./output/rec/predicts_ppocrv3_en.txt [2024/11/27 15:58:33] ppocr INFO: use_gpu : False [2024/11/27 15:58:33] ppocr INFO: use_space_char : True [2024/11/27 15:58:33] ppocr INFO: use_visualdl : False [2024/11/27 15:58:33] ppocr INFO: Loss : [2024/11/27 15:58:33] ppocr INFO: loss_config_list : [2024/11/27 15:58:33] ppocr INFO: CTCLoss : None [2024/11/27 15:58:33] ppocr INFO: SARLoss : None [2024/11/27 15:58:33] ppocr INFO: name : MultiLoss [2024/11/27 15:58:33] ppocr INFO: Metric : [2024/11/27 15:58:33] ppocr INFO: ignore_space : False [2024/11/27 15:58:33] ppocr INFO: main_indicator : acc [2024/11/27 15:58:33] ppocr INFO: name : RecMetric [2024/11/27 15:58:33] ppocr INFO: Optimizer : [2024/11/27 15:58:33] ppocr INFO: beta1 : 0.9 [2024/11/27 15:58:33] ppocr INFO: beta2 : 0.999 [2024/11/27 15:58:33] ppocr INFO: lr : [2024/11/27 15:58:33] ppocr INFO: learning_rate : 0.001 [2024/11/27 15:58:33] ppocr INFO: name : Cosine [2024/11/27 15:58:33] ppocr INFO: warmup_epoch : 5 [2024/11/27 15:58:33] ppocr INFO: name : Adam [2024/11/27 15:58:33] ppocr INFO: regularizer : [2024/11/27 15:58:33] ppocr INFO: factor : 3e-05 [2024/11/27 15:58:33] ppocr INFO: name : L2 [2024/11/27 15:58:33] ppocr INFO: PostProcess : [2024/11/27 15:58:33] ppocr INFO: name : CTCLabelDecode [2024/11/27 15:58:33] ppocr INFO: Train : [2024/11/27 15:58:33] ppocr INFO: dataset : [2024/11/27 15:58:33] ppocr INFO: data_dir : ../digits/train/ [2024/11/27 15:58:33] ppocr INFO: ext_op_transform_idx : 1 [2024/11/27 15:58:33] ppocr INFO: label_file_list : ['../digits/labels/train.txt'] [2024/11/27 15:58:33] ppocr INFO: name : SimpleDataSet [2024/11/27 15:58:33] ppocr INFO: transforms : [2024/11/27 15:58:33] ppocr INFO: DecodeImage : [2024/11/27 15:58:33] ppocr INFO: channel_first : False [2024/11/27 15:58:33] ppocr INFO: img_mode : BGR [2024/11/27 15:58:33] ppocr INFO: RecConAug : [2024/11/27 15:58:33] ppocr INFO: ext_data_num : 2 [2024/11/27 15:58:33] ppocr INFO: image_shape : [48, 320, 3] [2024/11/27 15:58:33] ppocr INFO: max_text_length : 25 [2024/11/27 15:58:33] ppocr INFO: prob : 0.5 [2024/11/27 15:58:33] ppocr INFO: RecAug : None [2024/11/27 15:58:33] ppocr INFO: MultiLabelEncode : None [2024/11/27 15:58:33] ppocr INFO: RecResizeImg : [2024/11/27 15:58:33] ppocr INFO: image_shape : [3, 48, 320] [2024/11/27 15:58:33] ppocr INFO: KeepKeys : [2024/11/27 15:58:33] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio'] [2024/11/27 15:58:33] ppocr INFO: loader : [2024/11/27 15:58:33] ppocr INFO: batch_size_per_card : 8 [2024/11/27 15:58:33] ppocr INFO: drop_last : True [2024/11/27 15:58:33] ppocr INFO: num_workers : 4 [2024/11/27 15:58:33] ppocr INFO: shuffle : True [2024/11/27 15:58:33] ppocr INFO: profiler_options : None [2024/11/27 15:58:33] ppocr INFO: train with paddle 2.6.2 and device Place(cpu) [2024/11/27 15:58:33] ppocr INFO: train from scratch [2024/11/27 15:58:33] ppocr INFO: infer_img: ../digits/valid/0/7616.png [2024/11/27 15:58:34] ppocr INFO: result: 76 0.09323589503765106 [2024/11/27 15:58:34] ppocr INFO: success!ied digit_dict and exports and everything looks fine

it should have gave me 0