PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
42.74k stars 7.68k forks source link

Inference model / Trained model difference #1736

Closed swbliss closed 3 years ago

swbliss commented 3 years ago

In advance, thank you for your effort to provide such a helpful framework 😃👍

The results of inference model and trained model are different. I could get the result that I expected with inference model, but not with trained model. As I plan to fine-tune the model started from checkpoint of trained model, I want to know what's wrong with me.

training model by infer_rec.py

When I set Global.checkpoints parameter (click to open details) ```bash (paddle-ocr) √ PaddleOCR % python tools/infer_rec.py -c configs/rec/multi_language/rec_en_number_lite_train.yml -o Global.checkpoints=./pretrain_models/en_number_mobile_v2.0_rec_train/best_accuracy Global.infer_img=./doc/imgs_words/en/word_1.png [2021/01/12 02:17:19] root INFO: Architecture : [2021/01/12 02:17:19] root INFO: Backbone : [2021/01/12 02:17:19] root INFO: model_name : small [2021/01/12 02:17:19] root INFO: name : MobileNetV3 [2021/01/12 02:17:19] root INFO: scale : 0.5 [2021/01/12 02:17:19] root INFO: small_stride : [1, 2, 2, 2] [2021/01/12 02:17:19] root INFO: Head : [2021/01/12 02:17:19] root INFO: fc_decay : 1e-05 [2021/01/12 02:17:19] root INFO: name : CTCHead [2021/01/12 02:17:19] root INFO: Neck : [2021/01/12 02:17:19] root INFO: encoder_type : rnn [2021/01/12 02:17:19] root INFO: hidden_size : 48 [2021/01/12 02:17:19] root INFO: name : SequenceEncoder [2021/01/12 02:17:19] root INFO: Transform : None [2021/01/12 02:17:19] root INFO: algorithm : CRNN [2021/01/12 02:17:19] root INFO: model_type : rec [2021/01/12 02:17:19] root INFO: Eval : [2021/01/12 02:17:19] root INFO: dataset : [2021/01/12 02:17:19] root INFO: data_dir : ./train_data/yumi's_cells/ [2021/01/12 02:17:19] root INFO: label_file_list : ["./train_data/yumi's_cells/rec_gt_test.txt"] [2021/01/12 02:17:19] root INFO: name : SimpleDataSet [2021/01/12 02:17:19] root INFO: transforms : [2021/01/12 02:17:19] root INFO: DecodeImage : [2021/01/12 02:17:19] root INFO: channel_first : False [2021/01/12 02:17:19] root INFO: img_mode : BGR [2021/01/12 02:17:19] root INFO: CTCLabelEncode : None [2021/01/12 02:17:19] root INFO: RecResizeImg : [2021/01/12 02:17:19] root INFO: image_shape : [3, 32, 320] [2021/01/12 02:17:19] root INFO: KeepKeys : [2021/01/12 02:17:19] root INFO: keep_keys : ['image', 'label', 'length'] [2021/01/12 02:17:19] root INFO: loader : [2021/01/12 02:17:19] root INFO: batch_size_per_card : 256 [2021/01/12 02:17:19] root INFO: drop_last : False [2021/01/12 02:17:19] root INFO: num_workers : 8 [2021/01/12 02:17:19] root INFO: shuffle : False [2021/01/12 02:17:19] root INFO: Global : [2021/01/12 02:17:19] root INFO: cal_metric_during_train : True [2021/01/12 02:17:19] root INFO: character_dict_path : ppocr/utils/dict/en_dict.txt [2021/01/12 02:17:19] root INFO: character_type : ch [2021/01/12 02:17:19] root INFO: checkpoints : ./pretrain_models/en_number_mobile_v2.0_rec_train/best_accuracy [2021/01/12 02:17:19] root INFO: debug : False [2021/01/12 02:17:19] root INFO: distributed : False [2021/01/12 02:17:19] root INFO: epoch_num : 100 [2021/01/12 02:17:19] root INFO: eval_batch_step : [0, 2000] [2021/01/12 02:17:19] root INFO: infer_img : ./doc/imgs_words/en/word_1.png [2021/01/12 02:17:19] root INFO: infer_mode : False [2021/01/12 02:17:19] root INFO: load_static_weights : False [2021/01/12 02:17:19] root INFO: log_smooth_window : 20 [2021/01/12 02:17:19] root INFO: max_text_length : 25 [2021/01/12 02:17:19] root INFO: pretrained_model : None [2021/01/12 02:17:19] root INFO: print_batch_step : 10 [2021/01/12 02:17:19] root INFO: save_epoch_step : 3 [2021/01/12 02:17:19] root INFO: save_inference_dir : None [2021/01/12 02:17:19] root INFO: save_model_dir : ./output/rec_en_number_lite [2021/01/12 02:17:19] root INFO: use_gpu : False [2021/01/12 02:17:19] root INFO: use_space_char : True [2021/01/12 02:17:19] root INFO: use_visualdl : False [2021/01/12 02:17:19] root INFO: Loss : [2021/01/12 02:17:19] root INFO: name : CTCLoss [2021/01/12 02:17:19] root INFO: Metric : [2021/01/12 02:17:19] root INFO: main_indicator : acc [2021/01/12 02:17:19] root INFO: name : RecMetric [2021/01/12 02:17:19] root INFO: Optimizer : [2021/01/12 02:17:19] root INFO: beta1 : 0.9 [2021/01/12 02:17:19] root INFO: beta2 : 0.999 [2021/01/12 02:17:19] root INFO: lr : [2021/01/12 02:17:19] root INFO: learning_rate : 0.001 [2021/01/12 02:17:19] root INFO: name : Cosine [2021/01/12 02:17:19] root INFO: name : Adam [2021/01/12 02:17:19] root INFO: regularizer : [2021/01/12 02:17:19] root INFO: factor : 1e-05 [2021/01/12 02:17:19] root INFO: name : L2 [2021/01/12 02:17:19] root INFO: PostProcess : [2021/01/12 02:17:19] root INFO: name : CTCLabelDecode [2021/01/12 02:17:19] root INFO: Train : [2021/01/12 02:17:19] root INFO: dataset : [2021/01/12 02:17:19] root INFO: data_dir : ./train_data/yumi's_cells/ [2021/01/12 02:17:19] root INFO: label_file_list : ["./train_data/yumi's_cells/rec_gt_train.txt"] [2021/01/12 02:17:19] root INFO: name : SimpleDataSet [2021/01/12 02:17:19] root INFO: transforms : [2021/01/12 02:17:19] root INFO: DecodeImage : [2021/01/12 02:17:19] root INFO: channel_first : False [2021/01/12 02:17:19] root INFO: img_mode : BGR [2021/01/12 02:17:19] root INFO: RecAug : None [2021/01/12 02:17:19] root INFO: CTCLabelEncode : None [2021/01/12 02:17:19] root INFO: RecResizeImg : [2021/01/12 02:17:19] root INFO: image_shape : [3, 32, 320] [2021/01/12 02:17:19] root INFO: KeepKeys : [2021/01/12 02:17:19] root INFO: keep_keys : ['image', 'label', 'length'] [2021/01/12 02:17:19] root INFO: loader : [2021/01/12 02:17:19] root INFO: batch_size_per_card : 256 [2021/01/12 02:17:19] root INFO: drop_last : True [2021/01/12 02:17:19] root INFO: num_workers : 8 [2021/01/12 02:17:19] root INFO: shuffle : True [2021/01/12 02:17:19] root INFO: train with paddle 2.0.0-rc1 and device CPUPlace [2021/01/12 02:17:19] root INFO: resume from ./pretrain_models/en_number_mobile_v2.0_rec_train/best_accuracy ./doc/imgs_words/en/word_1.png [2021/01/12 02:17:19] root INFO: infer_img: ./doc/imgs_words/en/word_1.png [2021/01/12 02:17:19] root INFO: result: ('DuAoumulud0XuoSku', 0.025586227) [2021/01/12 02:17:19] root INFO: success! ```
When I set Global.pretrained_model parameter (click to open details) ```bash (paddle-ocr) √ PaddleOCR % python tools/infer_rec.py -c configs/rec/multi_language/rec_en_number_lite_train.yml -o Global.pretrained_model=./pretrain_models/en_number_mobile_v2.0_rec_train/best_accuracy Global.infer_img=./doc/imgs_words/en/word_1.png [2021/01/12 02:28:33] root INFO: Architecture : [2021/01/12 02:28:33] root INFO: Backbone : [2021/01/12 02:28:33] root INFO: model_name : small [2021/01/12 02:28:33] root INFO: name : MobileNetV3 [2021/01/12 02:28:33] root INFO: scale : 0.5 [2021/01/12 02:28:33] root INFO: small_stride : [1, 2, 2, 2] [2021/01/12 02:28:33] root INFO: Head : [2021/01/12 02:28:33] root INFO: fc_decay : 1e-05 [2021/01/12 02:28:33] root INFO: name : CTCHead [2021/01/12 02:28:33] root INFO: Neck : [2021/01/12 02:28:33] root INFO: encoder_type : rnn [2021/01/12 02:28:33] root INFO: hidden_size : 48 [2021/01/12 02:28:33] root INFO: name : SequenceEncoder [2021/01/12 02:28:33] root INFO: Transform : None [2021/01/12 02:28:33] root INFO: algorithm : CRNN [2021/01/12 02:28:33] root INFO: model_type : rec [2021/01/12 02:28:33] root INFO: Eval : [2021/01/12 02:28:33] root INFO: dataset : [2021/01/12 02:28:33] root INFO: data_dir : ./train_data/yumi's_cells/ [2021/01/12 02:28:33] root INFO: label_file_list : ["./train_data/yumi's_cells/rec_gt_test.txt"] [2021/01/12 02:28:33] root INFO: name : SimpleDataSet [2021/01/12 02:28:33] root INFO: transforms : [2021/01/12 02:28:33] root INFO: DecodeImage : [2021/01/12 02:28:33] root INFO: channel_first : False [2021/01/12 02:28:33] root INFO: img_mode : BGR [2021/01/12 02:28:33] root INFO: CTCLabelEncode : None [2021/01/12 02:28:33] root INFO: RecResizeImg : [2021/01/12 02:28:33] root INFO: image_shape : [3, 32, 320] [2021/01/12 02:28:33] root INFO: KeepKeys : [2021/01/12 02:28:33] root INFO: keep_keys : ['image', 'label', 'length'] [2021/01/12 02:28:33] root INFO: loader : [2021/01/12 02:28:33] root INFO: batch_size_per_card : 256 [2021/01/12 02:28:33] root INFO: drop_last : False [2021/01/12 02:28:33] root INFO: num_workers : 8 [2021/01/12 02:28:33] root INFO: shuffle : False [2021/01/12 02:28:33] root INFO: Global : [2021/01/12 02:28:33] root INFO: cal_metric_during_train : True [2021/01/12 02:28:33] root INFO: character_dict_path : ppocr/utils/dict/en_dict.txt [2021/01/12 02:28:33] root INFO: character_type : ch [2021/01/12 02:28:33] root INFO: checkpoints : None [2021/01/12 02:28:33] root INFO: debug : False [2021/01/12 02:28:33] root INFO: distributed : False [2021/01/12 02:28:33] root INFO: epoch_num : 100 [2021/01/12 02:28:33] root INFO: eval_batch_step : [0, 2000] [2021/01/12 02:28:33] root INFO: infer_img : ./doc/imgs_words/en/word_1.png [2021/01/12 02:28:33] root INFO: infer_mode : False [2021/01/12 02:28:33] root INFO: load_static_weights : False [2021/01/12 02:28:33] root INFO: log_smooth_window : 20 [2021/01/12 02:28:33] root INFO: max_text_length : 25 [2021/01/12 02:28:33] root INFO: pretrained_model : ./pretrain_models/en_number_mobile_v2.0_rec_train/best_accuracy [2021/01/12 02:28:33] root INFO: print_batch_step : 10 [2021/01/12 02:28:33] root INFO: save_epoch_step : 3 [2021/01/12 02:28:33] root INFO: save_inference_dir : None [2021/01/12 02:28:33] root INFO: save_model_dir : ./output/rec_en_number_lite [2021/01/12 02:28:33] root INFO: use_gpu : False [2021/01/12 02:28:33] root INFO: use_space_char : True [2021/01/12 02:28:33] root INFO: use_visualdl : False [2021/01/12 02:28:33] root INFO: Loss : [2021/01/12 02:28:33] root INFO: name : CTCLoss [2021/01/12 02:28:33] root INFO: Metric : [2021/01/12 02:28:33] root INFO: main_indicator : acc [2021/01/12 02:28:33] root INFO: name : RecMetric [2021/01/12 02:28:33] root INFO: Optimizer : [2021/01/12 02:28:33] root INFO: beta1 : 0.9 [2021/01/12 02:28:33] root INFO: beta2 : 0.999 [2021/01/12 02:28:33] root INFO: lr : [2021/01/12 02:28:33] root INFO: learning_rate : 0.001 [2021/01/12 02:28:33] root INFO: name : Cosine [2021/01/12 02:28:33] root INFO: name : Adam [2021/01/12 02:28:33] root INFO: regularizer : [2021/01/12 02:28:33] root INFO: factor : 1e-05 [2021/01/12 02:28:33] root INFO: name : L2 [2021/01/12 02:28:33] root INFO: PostProcess : [2021/01/12 02:28:33] root INFO: name : CTCLabelDecode [2021/01/12 02:28:33] root INFO: Train : [2021/01/12 02:28:33] root INFO: dataset : [2021/01/12 02:28:33] root INFO: data_dir : ./train_data/yumi's_cells/ [2021/01/12 02:28:33] root INFO: label_file_list : ["./train_data/yumi's_cells/rec_gt_train.txt"] [2021/01/12 02:28:33] root INFO: name : SimpleDataSet [2021/01/12 02:28:33] root INFO: transforms : [2021/01/12 02:28:33] root INFO: DecodeImage : [2021/01/12 02:28:33] root INFO: channel_first : False [2021/01/12 02:28:33] root INFO: img_mode : BGR [2021/01/12 02:28:33] root INFO: RecAug : None [2021/01/12 02:28:33] root INFO: CTCLabelEncode : None [2021/01/12 02:28:33] root INFO: RecResizeImg : [2021/01/12 02:28:33] root INFO: image_shape : [3, 32, 320] [2021/01/12 02:28:33] root INFO: KeepKeys : [2021/01/12 02:28:33] root INFO: keep_keys : ['image', 'label', 'length'] [2021/01/12 02:28:33] root INFO: loader : [2021/01/12 02:28:33] root INFO: batch_size_per_card : 256 [2021/01/12 02:28:33] root INFO: drop_last : True [2021/01/12 02:28:33] root INFO: num_workers : 8 [2021/01/12 02:28:33] root INFO: shuffle : True [2021/01/12 02:28:33] root INFO: train with paddle 2.0.0-rc1 and device CPUPlace [2021/01/12 02:28:33] root INFO: load pretrained model from ['./pretrain_models/en_number_mobile_v2.0_rec_train/best_accuracy'] ./doc/imgs_words/en/word_1.png [2021/01/12 02:28:33] root INFO: infer_img: ./doc/imgs_words/en/word_1.png [2021/01/12 02:28:33] root INFO: result: ('UozaOmO s smoH', 0.02520519) [2021/01/12 02:28:33] root INFO: success! ```

inference model by predict_rec.py

Which is expected result

(paddle-ocr) √ PaddleOCR % python tools/infer/predict_rec.py --image_dir=./doc/imgs_words/en/word_1.png --rec_model_dir=./inference/en_number_mobile_v2.0_rec_infer/ --use_space_char=True --rec_char_dict_path=./ppocr/utils/dict/en_dict.txt
./doc/imgs_words/en/word_1.png
E0112 02:19:38.341466 300703168 analysis_config.cc:73] Please compile with gpu to EnableGpu()
[2021/01/12 02:19:38] root INFO: Predicts of ./doc/imgs_words/en/word_1.png:('JOINT', 0.92921716)
[2021/01/12 02:19:38] root INFO: Total predict time for 1 images, cost: 0.009
littletomatodonkey commented 3 years ago

Thanks for your attention, I use the latest code and Paddle2.0.0rc1, the result is right as follows, maybe you should try the latest code of the repo

image

swbliss commented 3 years ago

@littletomatodonkey Thank you for the fast reply. Can you share the commands you used and configuration info? I got the same result even with the latest code. I think I'm doing something wrong.

littletomatodonkey commented 3 years ago

The command is as follows.

python tools/infer_rec.py -c configs/rec/multi_language/rec_en_number_lite_train.yml -o Global.pretrained_model=/paddle/models/ocr/dyg/v1.1_dyg/en_number_mobile_v2.0_rec_train/best_accuracy Global.infer_img=./doc/imgs_words/en/word_1.png Global.use_gpu=False
swbliss commented 3 years ago

I have added Global.use_space_char=True option comparing to your command (like below). When I removed this option it works properly. Should I fine-tune the model with the custom dataset to use the space character?

(paddle-ocr) √ PaddleOCR % python tools/infer_rec.py -c configs/rec/multi_language/rec_en_number_lite_train.yml -o Global.pretrained_model=./pretrain_models/en_number_mobile_v2.0_rec_train/best_accuracy Global.infer_img=./doc/imgs_words/en/word_1.png Global.use_gpu=False Global.use_space_char=True
littletomatodonkey commented 3 years ago

Yes if you want to make the model recognize the space char, you should add the flag and finetune on your own dataset.

swbliss commented 3 years ago

@littletomatodonkey Thank you so much. I'm gonna try :)

rsingh2083 commented 3 years ago

@littletomatodonkey Thank you so much. I'm gonna try :)

Can you please tell which version of paddle-gpu you used for training ? My 2.1.1 doesnt work at all