PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.47k stars 7.84k forks source link

CPPD Model with space character #13774

Closed skyimager closed 2 months ago

skyimager commented 2 months ago

🔎 Search before asking

🐛 Bug (问题描述)

I am trying to finetune the CPPD model (rec_svtrnet_cppd_base_u14m.yml) with a custom dataset. I want to include space_character recognition in the model.

My config is as follows:

Global:
  use_gpu: True
  epoch_num: 100
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/rec/svtr_cppd_base_custom/
  save_epoch_step: 1
  # evaluation is run every 2000 iterations after the 0th iteration
  eval_batch_step: [0, 100]
  cal_metric_during_train: True
  pretrained_model: ./pretrained_models/rec_svtr_cppd_base_u14m_train/best_model.pdparams
  checkpoints:
  save_inference_dir:
  use_visualdl: False
  infer_img: doc/imgs_words_en/word_10.png
  # for data or label process
  character_dict_path: ./ppocr/utils/ic15_dict.txt
  character_type: en
  max_text_length: 25
  infer_mode: False
  use_space_char: True
  save_res_path: ./output/rec/predicts_svtr_cppd_base_u14m.txt

the preds key from the model output has the shape: (Batch, 26, 38)

During training, I am getting good accuracy on the training and validation set.

At the time of evaluation, I am getting all weird numbers in my output. I think the CPPDLabelEncode is not able to adjust to the 38 char node representation. Anything I am missing here?

🏃‍♂️ Environment (运行环境)

Linux Ubuntu 20

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Inference with: rec_algorithm 'CPPD' rec_image_shape '3,32,128' rec_char_dict_path ./ppocr/utils/ic15_dict.txt use_space True

Topdu commented 2 months ago

Please post some weird numbers in your output.