PaddleOCR api call results on inference model don't make any sense

Pedro69491 commented 3 days ago

🔎 Search before asking

[X] I have searched the PaddleOCR Docs and found no similar bug report.
[X] I have searched the PaddleOCR Issues and found no similar bug report.
[X] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

I have been using PaddleOCR training capabilities in a small dataset of digits, after 25 epochs the accuracy of the model reaches 100%, I then evaluate the model and I get an accuracy of 100% too. The problem is that when I try to test the model on the exact same images I used to eval, I get completely different results, of course Iexported the best weights from the trained recognition model

from PaddleOCR.paddleocr import PaddleOCR

ocr = PaddleOCR( use_gpu=False, rec_char_dict_path='./digit_dict.txt', rec_model_dir="./PaddleOCR/inference/rec_digits", # Path to the saved model )

result = ocr.ocr(image_path)

output: chinese symbol instead of a number

notes: digit_dict refers to a small text file containing the numbers from 0 to 9

I tried to use infer_rec.py but the results were no good at all again with data that we already used to validate, not sure what I should do next.

🏃‍♂️ Environment (运行环境)

linux: ubuntu 22.04 python3: 10.12

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

python3 tools/train.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.use_gpu=False

python3 tools/eval.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.checkpoints=./output/v3_en_mobile/iter_epoch_24.pdparams Global.use_gpu=False

python3 tools/export_model.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.checkpoints=./output/v3_en_mobile/iter_epoch_24.pdparams Global.save_inference_dir=./inference/rec_digits

GreatV commented 3 days ago

The issue you're facing, where the PaddleOCR inference model produces incorrect predictions (e.g., Chinese symbols instead of digits) despite correct training and evaluation results, is a common problem that arises in OCR workflows due to mismatched configurations or issues during model export and inference. Below are the likely causes and steps to resolve the issue:

1. Mismatch in `rec_char_dict_path`

Ensure that the rec_char_dict_path used during training, evaluation, and inference is consistent.
Your digit_dict.txt should contain only the digits 0-9, one per line. Double-check that this file does not include any non-numeric characters or extra blank spaces.

Example:

If there is any discrepancy in the character dictionary, the inference model may produce unexpected characters, such as Chinese symbols, because it interprets the output indices incorrectly.

2. Incorrect Model Export

When exporting the trained model to an inference model, ensure that the correct checkpoints and configurations are used. Verify that the following steps were executed correctly:

The export_model.py script should reference the same configuration file (en_PP-OCRv3_rec.yml) and checkpoint file (iter_epoch_24.pdparams) as used during training and evaluation.
The Global.rec_char_dict_path parameter in the configuration file should point to your digit_dict.txt.

Command you ran:

python3 tools/export_model.py \
  -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml \
  -o Global.checkpoints=./output/v3_en_mobile/iter_epoch_24.pdparams \
     Global.save_inference_dir=./inference/rec_digits

Verify the above command and ensure the paths are correct. If there was an issue during export, the inference model may not align with the trained weights.

3. Inference Configuration Issues

The issue may lie in the parameters passed to the PaddleOCR class during inference. Based on your code:

ocr = PaddleOCR(
    use_gpu=False,
    rec_char_dict_path='./digit_dict.txt',
    rec_model_dir="./PaddleOCR/inference/rec_digits"
)

Check the following:

rec_model_dir should point to the folder containing the exported inference model files (inference.pdiparams, inference.pdiparams.info, and inference.pdmodel).
Ensure there are no typos in the rec_char_dict_path parameter.

If the configuration is incorrect, the model may fail to interpret its predictions properly.

4. Model Evaluation vs. Inference

Even though the model achieves 100% accuracy during evaluation, this does not guarantee it will work perfectly during inference. The evaluation step uses the training pipeline, which differs slightly from the inference pipeline in terms of preprocessing and postprocessing. Here are some steps to debug:

Run infer_rec.py: Use the infer_rec.py script to test the exported model directly on your test images. This script uses the same preprocessing pipeline as the training process.
```
python3 tools/infer_rec.py \
--image_dir=path_to_test_image \
--rec_model_dir=./PaddleOCR/inference/rec_digits \
--rec_char_dict_path=./digit_dict.txt \
--use_gpu=False
```

Compare the results from infer_rec.py with those from the PaddleOCR API. If the results from infer_rec.py are correct, the issue may lie in how the PaddleOCR class is configured.

5. Preprocessing Differences

Differences in preprocessing between training and inference can also cause discrepancies. For example:

Resize settings, normalization, or padding during inference might differ from the training pipeline.
The rec_image_shape parameter in your configuration file (e.g., en_PP-OCRv3_rec.yml) should match the input shape expected by the inference model.

Verify the rec_image_shape in your config file (e.g., ["3", "32", "100"] for recognition) and ensure it matches the preprocessing applied in your inference script.

6. Debugging Steps

Check Model Export Logs: Ensure no warnings or errors occurred during model export.
Verify Character Set: Use the same digit_dict.txt file across training, evaluation, export, and inference.
Test with infer_rec.py: Run the exported model using the infer_rec.py script and compare results.
Inspect Preprocessing: Verify that the preprocessing pipeline used during inference matches the training pipeline.

References to Related Issues

Issue #13972: 导出模型inference后，paddleocr命令行执行结果不一致
- This issue highlights inconsistencies in results due to differences in configurations or parameters during inference. Ensure that the character dictionary and model paths are consistent.
Discussion #14088: 用自己数据集训练的识别模型的train_model转换为inference_model后识别结果错误
- Similar issue where the exported inference model produced incorrect results. The resolution involved double-checking the character dictionary and preprocessing settings.

Conclusion

To resolve your issue:

Verify the digit_dict.txt file and ensure it is correctly referenced during all steps.
Re-export the model using consistent configurations and checkpoints.
Test the model with infer_rec.py to isolate whether the issue lies in the export process or the PaddleOCR API configuration.
Align preprocessing between training and inference.

If the problem persists, share the outputs of infer_rec.py and your configuration file for further debugging.

Response generated by 🤖 feifei-bot | chatgpt-4o-latest

Pedro69491 commented 3 days ago

python3 tools/infer_rec.py \ -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml \ -o Global.infer_rec_model_dir="./inference/rec_digits" \ Global.infer_img="../digits/valid/0/7616.png" \ Global.use_gpu=False [2024/11/27 15:58:33] ppocr INFO: Architecture : [2024/11/27 15:58:33] ppocr INFO: Backbone : [2024/11/27 15:58:33] ppocr INFO: last_conv_stride : [1, 2] [2024/11/27 15:58:33] ppocr INFO: last_pool_kernel_size : [2, 2] [2024/11/27 15:58:33] ppocr INFO: last_pool_type : avg [2024/11/27 15:58:33] ppocr INFO: name : MobileNetV1Enhance [2024/11/27 15:58:33] ppocr INFO: scale : 0.5 [2024/11/27 15:58:33] ppocr INFO: Head : [2024/11/27 15:58:33] ppocr INFO: head_list : [2024/11/27 15:58:33] ppocr INFO: CTCHead : [2024/11/27 15:58:33] ppocr INFO: Head : [2024/11/27 15:58:33] ppocr INFO: fc_decay : 1e-05 [2024/11/27 15:58:33] ppocr INFO: Neck : [2024/11/27 15:58:33] ppocr INFO: depth : 2 [2024/11/27 15:58:33] ppocr INFO: dims : 64 [2024/11/27 15:58:33] ppocr INFO: hidden_dims : 120 [2024/11/27 15:58:33] ppocr INFO: name : svtr [2024/11/27 15:58:33] ppocr INFO: use_guide : True [2024/11/27 15:58:33] ppocr INFO: SARHead : [2024/11/27 15:58:33] ppocr INFO: enc_dim : 512 [2024/11/27 15:58:33] ppocr INFO: max_text_length : 25 [2024/11/27 15:58:33] ppocr INFO: name : MultiHead [2024/11/27 15:58:33] ppocr INFO: Transform : None [2024/11/27 15:58:33] ppocr INFO: algorithm : SVTR_LCNet [2024/11/27 15:58:33] ppocr INFO: model_type : rec [2024/11/27 15:58:33] ppocr INFO: Eval : [2024/11/27 15:58:33] ppocr INFO: dataset : [2024/11/27 15:58:33] ppocr INFO: data_dir : ../digits/valid [2024/11/27 15:58:33] ppocr INFO: label_file_list : ['../digits/labels/valid.txt'] [2024/11/27 15:58:33] ppocr INFO: name : SimpleDataSet [2024/11/27 15:58:33] ppocr INFO: transforms : [2024/11/27 15:58:33] ppocr INFO: DecodeImage : [2024/11/27 15:58:33] ppocr INFO: channel_first : False [2024/11/27 15:58:33] ppocr INFO: img_mode : BGR [2024/11/27 15:58:33] ppocr INFO: MultiLabelEncode : None [2024/11/27 15:58:33] ppocr INFO: RecResizeImg : [2024/11/27 15:58:33] ppocr INFO: image_shape : [3, 48, 320] [2024/11/27 15:58:33] ppocr INFO: KeepKeys : [2024/11/27 15:58:33] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio'] [2024/11/27 15:58:33] ppocr INFO: loader : [2024/11/27 15:58:33] ppocr INFO: batch_size_per_card : 128 [2024/11/27 15:58:33] ppocr INFO: drop_last : False [2024/11/27 15:58:33] ppocr INFO: num_workers : 4 [2024/11/27 15:58:33] ppocr INFO: shuffle : False [2024/11/27 15:58:33] ppocr INFO: Global : [2024/11/27 15:58:33] ppocr INFO: cal_metric_during_train : True [2024/11/27 15:58:33] ppocr INFO: character_dict_path : ../digit_dict.txt [2024/11/27 15:58:33] ppocr INFO: checkpoints : None [2024/11/27 15:58:33] ppocr INFO: debug : False [2024/11/27 15:58:33] ppocr INFO: distributed : False [2024/11/27 15:58:33] ppocr INFO: epoch_num : 25 [2024/11/27 15:58:33] ppocr INFO: eval_batch_step : [0, 2000] [2024/11/27 15:58:33] ppocr INFO: infer_img : ../digits/valid/0/7616.png [2024/11/27 15:58:33] ppocr INFO: infer_mode : False [2024/11/27 15:58:33] ppocr INFO: infer_rec_model_dir : ./inference/rec_digits [2024/11/27 15:58:33] ppocr INFO: log_smooth_window : 20 [2024/11/27 15:58:33] ppocr INFO: max_text_length : 25 [2024/11/27 15:58:33] ppocr INFO: pretrained_model : None [2024/11/27 15:58:33] ppocr INFO: print_batch_step : 10 [2024/11/27 15:58:33] ppocr INFO: save_epoch_step : 3 [2024/11/27 15:58:33] ppocr INFO: save_inference_dir : None [2024/11/27 15:58:33] ppocr INFO: save_model_dir : ./output/v3_en_mobile [2024/11/27 15:58:33] ppocr INFO: save_res_path : ./output/rec/predicts_ppocrv3_en.txt [2024/11/27 15:58:33] ppocr INFO: use_gpu : False [2024/11/27 15:58:33] ppocr INFO: use_space_char : True [2024/11/27 15:58:33] ppocr INFO: use_visualdl : False [2024/11/27 15:58:33] ppocr INFO: Loss : [2024/11/27 15:58:33] ppocr INFO: loss_config_list : [2024/11/27 15:58:33] ppocr INFO: CTCLoss : None [2024/11/27 15:58:33] ppocr INFO: SARLoss : None [2024/11/27 15:58:33] ppocr INFO: name : MultiLoss [2024/11/27 15:58:33] ppocr INFO: Metric : [2024/11/27 15:58:33] ppocr INFO: ignore_space : False [2024/11/27 15:58:33] ppocr INFO: main_indicator : acc [2024/11/27 15:58:33] ppocr INFO: name : RecMetric [2024/11/27 15:58:33] ppocr INFO: Optimizer : [2024/11/27 15:58:33] ppocr INFO: beta1 : 0.9 [2024/11/27 15:58:33] ppocr INFO: beta2 : 0.999 [2024/11/27 15:58:33] ppocr INFO: lr : [2024/11/27 15:58:33] ppocr INFO: learning_rate : 0.001 [2024/11/27 15:58:33] ppocr INFO: name : Cosine [2024/11/27 15:58:33] ppocr INFO: warmup_epoch : 5 [2024/11/27 15:58:33] ppocr INFO: name : Adam [2024/11/27 15:58:33] ppocr INFO: regularizer : [2024/11/27 15:58:33] ppocr INFO: factor : 3e-05 [2024/11/27 15:58:33] ppocr INFO: name : L2 [2024/11/27 15:58:33] ppocr INFO: PostProcess : [2024/11/27 15:58:33] ppocr INFO: name : CTCLabelDecode [2024/11/27 15:58:33] ppocr INFO: Train : [2024/11/27 15:58:33] ppocr INFO: dataset : [2024/11/27 15:58:33] ppocr INFO: data_dir : ../digits/train/ [2024/11/27 15:58:33] ppocr INFO: ext_op_transform_idx : 1 [2024/11/27 15:58:33] ppocr INFO: label_file_list : ['../digits/labels/train.txt'] [2024/11/27 15:58:33] ppocr INFO: name : SimpleDataSet [2024/11/27 15:58:33] ppocr INFO: transforms : [2024/11/27 15:58:33] ppocr INFO: DecodeImage : [2024/11/27 15:58:33] ppocr INFO: channel_first : False [2024/11/27 15:58:33] ppocr INFO: img_mode : BGR [2024/11/27 15:58:33] ppocr INFO: RecConAug : [2024/11/27 15:58:33] ppocr INFO: ext_data_num : 2 [2024/11/27 15:58:33] ppocr INFO: image_shape : [48, 320, 3] [2024/11/27 15:58:33] ppocr INFO: max_text_length : 25 [2024/11/27 15:58:33] ppocr INFO: prob : 0.5 [2024/11/27 15:58:33] ppocr INFO: RecAug : None [2024/11/27 15:58:33] ppocr INFO: MultiLabelEncode : None [2024/11/27 15:58:33] ppocr INFO: RecResizeImg : [2024/11/27 15:58:33] ppocr INFO: image_shape : [3, 48, 320] [2024/11/27 15:58:33] ppocr INFO: KeepKeys : [2024/11/27 15:58:33] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio'] [2024/11/27 15:58:33] ppocr INFO: loader : [2024/11/27 15:58:33] ppocr INFO: batch_size_per_card : 8 [2024/11/27 15:58:33] ppocr INFO: drop_last : True [2024/11/27 15:58:33] ppocr INFO: num_workers : 4 [2024/11/27 15:58:33] ppocr INFO: shuffle : True [2024/11/27 15:58:33] ppocr INFO: profiler_options : None [2024/11/27 15:58:33] ppocr INFO: train with paddle 2.6.2 and device Place(cpu) [2024/11/27 15:58:33] ppocr INFO: train from scratch [2024/11/27 15:58:33] ppocr INFO: infer_img: ../digits/valid/0/7616.png [2024/11/27 15:58:34] ppocr INFO: result: 76 0.09323589503765106 [2024/11/27 15:58:34] ppocr INFO: success!ied digit_dict and exports and everything looks fine

it should have gave me 0

PaddlePaddle / PaddleOCR