PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.64k stars 7.86k forks source link

Table Recognition: Training model is transferred to the inference model, the prediction effect is inconsistent? #14278

Open vuthehuy1997 opened 2 days ago

vuthehuy1997 commented 2 days ago

๐Ÿ”Ž Search before asking

๐Ÿ› Bug (้—ฎ้ข˜ๆ่ฟฐ)

When I finished training the SLANet recognition table model, I exported it to a format for inference, but the results were wrong. i only edit SLANet.yml to SLANet_finetune.yml like tutorial on github document

Model training command (finetune from model ch_ppstructure_mobile_v2.0_SLANet_train)

python3 tools/train.py -c configs/table/SLANet_finetune.yml\
 -o Global.pretrained_model=./weights/whl/table/ch_ppstructure_mobile_v2.0_SLANet_train/best_accuracy \
 Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True

export command:

python3 tools/export_model.py -c configs/table/SLANet_finetune.yml -o Global.pretrained_model=../translate-pdf-scan/src/weights/table_rec/SLANet_ch/best_accuracy Global.save_inference_dir=../translate-pdf-scan/src/weights/table_rec/ch_ppstructure_mobile_v2.0_SLANet_finetune

If i run the model after training with the command:

python3 tools/infer_table.py -c configs/table/SLANet_finetune.yml -o Global.pretrained_model=../translate-pdf-scan/src/weights/table_rec/SLANet_ch/best_accuracy Global.infer_img=ppstructure/docs/table/table .jpg

Then it gives correct results

result: ['<html>', '<body>', '<table>', '<tr>', '<td', ' colspan="2"', '>', '</td>', '<td', ' rowspan="2"', '>', '</ td>', '<td', ' colspan="2"', '>', '</td>', '<td', ' rowspan="2"', '>', '</td> ', '<td', ' rowspan="2"', '>', '</td>', '<td', ' rowspan="2"', '>', '</td>', '<td', ' rowspan= "2"', '>', '</td>', '<td', ' rowspan="2"', '>', '</td>', '<td', ' rowspan="2 "', '>', '</td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', ..., '</table>' , '</body>', '</html>'], [[0.7451880574226379, 1.5897825956344604, 122.04891204833984, 1.5919700860977173, 122.1083755493164, 34.912353515625, 0.7465599775314331, 34.94774627685547], ...

If running model was export:

cd ppstructure/
python table/predict_structure.py\
 --table_model_dir=../../translate-pdf-scan/src/weights/table_rec/ch_ppstructure_mobile_v2.0_SLANet_finetune \
 --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
 --image_dir=docs/table/table.jpg \
 --output=../output/table

Then the result will be wrong

['<html>', '<body>', '<table>', '</tbody>', '</tbody>', '</tbody>', '</tbody>', '</ tbody>', '</thead>', '</tbody>', '</tbody>', '</thead>', '</thead>', '</tbody>', '</thead >', ..., '<tr>', '</table>', '</body>', '</html>'], [[2.9795612022098794e-07, 2.2673059163480502e-07, 8.608171120361163e-11, 9.731257932799053e-07, 8.894006642279351e-10, 3.4046809105348075e-07, 3.1248458043364735e-08, 17.488468170166016, 9.80240444370395e-10, 8.043410382185812e-09, ...

Does anyone know what the error is when exporting?

๐Ÿƒโ€โ™‚๏ธ Environment (่ฟ่กŒ็Žฏๅขƒ)

Ubuntu
paddlepaddle_gpu=3.0.0b0
NVIDIA A40
cuda 12.3

๐ŸŒฐ Minimal Reproducible Example (ๆœ€ๅฐๅฏๅค็Žฐ้—ฎ้ข˜็š„Demo)

Global:
  use_gpu: True
  epoch_num: 400
  log_smooth_window: 20
  print_batch_step: 20
  save_model_dir: ./output/SLANet_ch
  save_epoch_step: 400
  # evaluation is run every 331 iterations after the 0th iteration
  eval_batch_step: [0, 331]
  cal_metric_during_train: True
  pretrained_model: ./weights/whl/table/ch_ppstructure_mobile_v2.0_SLANet_train/best_accuracy.pdparams
  checkpoints: 
  save_inference_dir: ./output/SLANet_ch/infer
  use_visualdl: False
  infer_img: ppstructure/docs/table/table.jpg
  # for data or label process
  character_dict_path: ppocr/utils/dict/table_structure_dict_ch.txt
  character_type: en
  max_text_length: &max_text_length 500
  box_format: &box_format xyxyxyxy # 'xywh', 'xyxy', 'xyxyxyxy'
  infer_mode: False
  use_sync_bn: True
  save_res_path: output/infer

Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  clip_norm: 5.0
  # lr:
  #   learning_rate: 0.001
  # regularizer:
  #   name: 'L2'
  #   factor: 0.00000
  lr:
    name: Cosine
    learning_rate: 0.00017 #
    warmup_epoch: 0
  regularizer:
    name: 'L2'
    factor: 0

Architecture:
  model_type: table
  algorithm: SLANet
  Backbone:
    name: PPLCNet
    scale: 1.0
    pretrained: True
    use_ssld: True
  Neck:
    name: CSPPAN
    out_channels: 96
  Head:
    name: SLAHead
    hidden_size: 256
    max_text_length: *max_text_length
    loc_reg_num: &loc_reg_num 8

Loss:
  name: SLALoss
  structure_weight: 1.0
  loc_weight: 2.0
  loc_loss: smooth_l1

PostProcess:
  name: TableLabelDecode
  merge_no_span_structure: &merge_no_span_structure True

Metric:
  name: TableMetric
  main_indicator: acc
  compute_bbox_metric: False
  loc_reg_num: *loc_reg_num
  box_format: *box_format
  del_thead_tbody: True

Train:
  dataset:
    name: PubTabDataSet
    data_dir: ./datasets/train/table/table_envi/
    label_file_list: [./datasets/train/table/table_envi/train.jsonl]
    transforms:
      - DecodeImage:
          img_mode: BGR
          channel_first: False
      - TableLabelEncode:
          learn_empty_box: False
          merge_no_span_structure: *merge_no_span_structure
          replace_empty_cell_token: False
          loc_reg_num: *loc_reg_num
          max_text_length: *max_text_length
      - TableBoxEncode:
          in_box_format: *box_format
          out_box_format: *box_format
      - ResizeTableImage:
          max_len: 488
      - NormalizeImage:
          scale: 1./255.
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: 'hwc'
      - PaddingTableImage:
          size: [488, 488]
      - ToCHWImage:
      - KeepKeys:
          keep_keys: ['image', 'structure', 'bboxes', 'bbox_masks', 'length', 'shape']
  loader:
    shuffle: True
    batch_size_per_card: 16
    drop_last: True
    num_workers: 1

Eval:
  dataset:
    name: PubTabDataSet
    data_dir: ./datasets/train/table/table_envi/
    label_file_list: [./datasets/train/table/table_envi/val.jsonl]
    transforms:
      - DecodeImage:
          img_mode: BGR
          channel_first: False
      - TableLabelEncode:
          learn_empty_box: False
          merge_no_span_structure: *merge_no_span_structure
          replace_empty_cell_token: False
          loc_reg_num: *loc_reg_num
          max_text_length: *max_text_length
      - TableBoxEncode:
          in_box_format: *box_format
          out_box_format: *box_format
      - ResizeTableImage:
          max_len: 488
      - NormalizeImage:
          scale: 1./255.
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: 'hwc'
      - PaddingTableImage:
          size: [488, 488]
      - ToCHWImage:
      - KeepKeys:
          keep_keys: ['image', 'structure', 'bboxes', 'bbox_masks', 'length', 'shape']
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 16
    num_workers: 1
GreatV commented 2 days ago

The issue you are experiencing, where the inference results using the exported model are inconsistent or incorrect compared to the results from training or direct inference with the best_accuracy checkpoint, is a common problem when exporting models for static inference. Below are potential causes and solutions based on the provided details and similar issues encountered in PaddleOCR:


Potential Causes

  1. Inconsistent Preprocessing Between Training and Inference

    • The preprocessing steps (e.g., image normalization, resizing, padding) during training or inference might differ from those applied in predict_structure.py. If the exported inference model uses preprocessing steps different from those used during training, this could lead to mismatched results.
  2. Postprocessing Differences

    • The way the predictions are decoded (e.g., parsing table structure) might vary between infer_table.py and predict_structure.py. This can cause discrepancies in the output.
  3. Character Dictionary Mismatch

    • During inference, the wrong character dictionary might be used, leading to incorrect decoding of the table structure.
  4. Export Command Configuration

    • The export process might not fully replicate the configuration used during training, such as missing parameters like character_dict_path, max_text_length, or merge_no_span_structure.
  5. Dynamic vs. Static Graph Differences

    • In PaddleOCR, the training model (best_accuracy) uses a dynamic computational graph, while the exported model uses a static graph. Some components (e.g., batch normalization or variable shapes) may behave differently in static graphs compared to dynamic ones.
  6. Error in the Exported Model

    • There might be a bug in the export_model.py implementation or the configuration passed to it, causing the exported model to incorrectly represent the trained model.

Steps to Debug and Resolve

1. Verify Preprocessing Consistency

2. Ensure Postprocessing Consistency

3. Confirm Character Dictionary

4. Pass All Necessary Parameters During Export

5. Run Direct Inference on Exported Model

6. Validate the Exported Model

7. Check for Known Issues

8. Debug Static Graph Behavior

9. Test with a Pretrained Model


Recommended Workflow

  1. Preprocessing/Postprocessing Check: Ensure preprocessing and postprocessing are consistent across training, inference, and export.
  2. Parameter Alignment: Pass all required parameters during both training and export.
  3. Export Validation: Test the exported model directly using tools/infer_table.py.
  4. Static Graph Debugging: Use infer_mode=True during training to debug differences in dynamic vs. static graph behavior.

If none of these solutions resolve the issue, you might need to open a detailed issue on the PaddleOCR GitHub repository with a focus on the export process. Be sure to include:


Response generated by ๐Ÿค– feifei-bot | chatgpt-4o-latest

GreatV commented 2 days ago

@vuthehuy1997 you may need try paddleocr 2.8.1.

vuthehuy1997 commented 1 day ago

The issue you are experiencing, where the inference results using the exported model are inconsistent or incorrect compared to the results from training or direct inference with the best_accuracy checkpoint, is a common problem when exporting models for static inference. Below are potential causes and solutions based on the provided details and similar issues encountered in PaddleOCR:

Potential Causes

1. **Inconsistent Preprocessing Between Training and Inference**

   * The preprocessing steps (e.g., image normalization, resizing, padding) during training or inference might differ from those applied in `predict_structure.py`. If the exported inference model uses preprocessing steps different from those used during training, this could lead to mismatched results.

2. **Postprocessing Differences**

   * The way the predictions are decoded (e.g., parsing table structure) might vary between `infer_table.py` and `predict_structure.py`. This can cause discrepancies in the output.

3. **Character Dictionary Mismatch**

   * During inference, the wrong character dictionary might be used, leading to incorrect decoding of the table structure.

4. **Export Command Configuration**

   * The export process might not fully replicate the configuration used during training, such as missing parameters like `character_dict_path`, `max_text_length`, or `merge_no_span_structure`.

5. **Dynamic vs. Static Graph Differences**

   * In PaddleOCR, the training model (`best_accuracy`) uses a dynamic computational graph, while the exported model uses a static graph. Some components (e.g., batch normalization or variable shapes) may behave differently in static graphs compared to dynamic ones.

6. **Error in the Exported Model**

   * There might be a bug in the `export_model.py` implementation or the configuration passed to it, causing the exported model to incorrectly represent the trained model.

Steps to Debug and Resolve

1. Verify Preprocessing Consistency

* Compare the preprocessing steps in `SLANet_finetune.yml` with those in `predict_structure.py`. Key preprocessing components to check:

  * `DecodeImage`
  * `NormalizeImage` (mean, stddev)
  * `ResizeTableImage`
  * `PaddingTableImage`

* Ensure that the same transformations are applied in both training and inference.

2. Ensure Postprocessing Consistency

* Check the `PostProcess` block in your configuration file and ensure that parameters like `merge_no_span_structure` are consistent with how `predict_structure.py` decodes the output.

* Modify `predict_structure.py` to align it with `infer_table.py`'s decoding logic if needed.

3. Confirm Character Dictionary

* Verify that the correct character dictionary (`table_structure_dict_ch.txt`) is used. Ensure that the same dictionary is referenced during training, inference, and export.

4. Pass All Necessary Parameters During Export

* Ensure that all critical parameters from `SLANet_finetune.yml` are passed during export. For example:
  ```shell
  python3 tools/export_model.py -c configs/table/SLANet_finetune.yml \
    -o Global.pretrained_model=../translate-pdf-scan/src/weights/table_rec/SLANet_ch/best_accuracy \
       Global.save_inference_dir=../translate-pdf-scan/src/weights/table_rec/ch_ppstructure_mobile_v2.0_SLANet_finetune \
       Global.character_dict_path=ppocr/utils/dict/table_structure_dict_ch.txt \
       Global.max_text_length=500 \
       PostProcess.merge_no_span_structure=True
  ```

5. Run Direct Inference on Exported Model

* Test if the exported model is functional by using `tools/infer_table.py` directly on the exported model:
  ```shell
  python3 tools/infer_table.py -c configs/table/SLANet_finetune.yml \
    -o Global.pretrained_model=../translate-pdf-scan/src/weights/table_rec/ch_ppstructure_mobile_v2.0_SLANet_finetune \
      Global.infer_img=ppstructure/docs/table/table.jpg
  ```

* If the result is correct, the issue lies specifically in `predict_structure.py`.

6. Validate the Exported Model

* Use the `paddle_infer` API to check whether the exported model is working as expected:
  ```shell
  paddle_infer --model_dir=../translate-pdf-scan/src/weights/table_rec/ch_ppstructure_mobile_v2.0_SLANet_finetune \
    --image_dir=ppstructure/docs/table/table.jpg
  ```

* If the results are incorrect, the issue is likely during the export process.

7. Check for Known Issues

* Similar problems have been reported in PaddleOCR's GitHub repository. For example:

  * [Issue #14064](https://github.com/PaddlePaddle/PaddleOCR/issues/14064)
  * [Issue #14088](https://github.com/PaddlePaddle/PaddleOCR/discussions/14088)

* These issues often involve differences in preprocessing or bugs in the export script.

8. Debug Static Graph Behavior

* Some differences might arise due to the transition from a dynamic to a static graph. Try retraining with the `Global.infer_mode=True` option to force the model into inference mode during training:
  ```yaml
  Global:
    infer_mode: True
  ```

9. Test with a Pretrained Model

* As a sanity check, export the pretrained model (`ch_ppstructure_mobile_v2.0_SLANet_train`) without fine-tuning to ensure the export process works correctly:
  ```shell
  python3 tools/export_model.py -c configs/table/SLANet.yml \
    -o Global.pretrained_model=./weights/whl/table/ch_ppstructure_mobile_v2.0_SLANet_train/best_accuracy \
       Global.save_inference_dir=./output/pretrained_inference
  ```

Recommended Workflow

1. **Preprocessing/Postprocessing Check**: Ensure preprocessing and postprocessing are consistent across training, inference, and export.

2. **Parameter Alignment**: Pass all required parameters during both training and export.

3. **Export Validation**: Test the exported model directly using `tools/infer_table.py`.

4. **Static Graph Debugging**: Use `infer_mode=True` during training to debug differences in dynamic vs. static graph behavior.

If none of these solutions resolve the issue, you might need to open a detailed issue on the PaddleOCR GitHub repository with a focus on the export process. Be sure to include:

* The exact commands you ran.

* Configuration files used.

* Input/output examples from both `infer_table.py` and `predict_structure.py`.

Response generated by ๐Ÿค– feifei-bot | chatgpt-4o-latest

Thanks for your support

About Steps to Debug and Resolve

  1. Verify Preprocessing Consistency
  2. Ensure Postprocessing Consistency
  3. Confirm Character Dictionary
  4. Pass All Necessary Parameters During Export

I use config from github is configs/table/SLANet_ch.yml -> change data and add pretrained according to finetune instructions -> configs/table/SLANet_finetune.yml

  1. Run Direct Inference on Exported Model

python3 tools/infer_table.py -c configs/table/SLANet_finetune.yml \ -o Global.pretrained_model=../translate-pdf-scan/src/weights/table_rec/ch_ppstructure_mobile_v2.0_SLANet_finetune \ Global.infer_img=ppstructure/docs/table/table.jpg

Get results AssertionError: The ../translate-pdf-scan/src/weights/table_rec/ch_ppstructure_mobile_v2.0_SLANet_finetune_infer/.pdparams does not exists!

tools/infer_table.py is used to run the infer model which does not support running mdel train

  1. Validate the Exported Model

Use the paddle_infer API to check whether the exported model is working as expected:

paddle_infer --model_dir=../translate-pdf-scan/src/weights/table_rec/ch_ppstructure_mobile_v2.0_SLANet_finetune \ --image_dir=ppstructure/docs/table/table.jpg

I do not use

  1. Check for Known Issues

I can not get problem from it

  1. Debug Static Graph Behavior

Some differences might arise due to the transition from a dynamic to a static graph. Try retraining with the Global.infer_mode=True option to force the model into inference mode during training:

Global: infer_mode: True -> Tested but the result is still not correct

  1. Test with a Pretrained Model python3 tools/export_model.py -c configs/table/SLANet_ch.yml -o \ Global.pretrained_model=../translate-pdf-scan/src/weights/table_rec/ch_ppstructure_mobile_v2.0_SLANet_train/best_accuracy \ Global.save_inference_dir=../translate-pdf-scan/src/weights/table_rec/ch_ppstructure_mobile_v2.0_SLANet_train_infer

python3 tools/infer_table.py -c configs/table/SLANet_ch.yml -o \ Global.pretrained_model=../translate-pdf-scan/src/weights/table_rec/ch_ppstructure_mobile_v2.0_SLANet_train/best_accuracy Global.infer_img=ppstructure/docs/table/table.jpg

cd ppstructure/ python table/predict_structure.py \ --table_model_dir=../../translate-pdf-scan/src/weights/table_rec/ch_ppstructure_mobile_v2.0_SLANet_train_infer \ --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \ --image_dir=docs/table/table.jpg \ --output=../output/

-> The result is still wrong

paddleocr 2.8.1 : I also tried to install it but it didn't work..

I noticed that the predicted output of the exported model is very small, and many,

the model train predicts the box to shape (8,10) then the exported model outputs to (50, 60),

The result returns 50 instead of 8 coordinates xyxyxyxy