piarosebelledelapaz commented 1 month ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

系统环境/System Environment：Windows 11
版本号/Version：Paddle： PaddleOCR：问题相关组件/Related components：
运行指令/Command Code：

python tools/train.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model=./pretrained_models/rec_r50_vd_srn_train/best_accuracy

完整报错/Complete Error Message：

[2024/05/04 03:21:49] ppocr INFO: Architecture : [2024/05/04 03:21:49] ppocr INFO: Backbone : [2024/05/04 03:21:49] ppocr INFO: name : ResNetFPN [2024/05/04 03:21:49] ppocr INFO: Head : [2024/05/04 03:21:49] ppocr INFO: hidden_dims : 512 [2024/05/04 03:21:49] ppocr INFO: max_text_length : 25 [2024/05/04 03:21:49] ppocr INFO: name : SRNHead [2024/05/04 03:21:49] ppocr INFO: num_decoder_TUs : 4 [2024/05/04 03:21:49] ppocr INFO: num_encoder_TUs : 2 [2024/05/04 03:21:49] ppocr INFO: num_heads : 8 [2024/05/04 03:21:49] ppocr INFO: Transform : None [2024/05/04 03:21:49] ppocr INFO: algorithm : SRN [2024/05/04 03:21:49] ppocr INFO: in_channels : 1 [2024/05/04 03:21:49] ppocr INFO: model_type : rec [2024/05/04 03:21:49] ppocr INFO: Eval : [2024/05/04 03:21:49] ppocr INFO: dataset : [2024/05/04 03:21:49] ppocr INFO: data_dir : ./dataset/recognition/v2_img_eval_rec/ [2024/05/04 03:21:49] ppocr INFO: label_file_list : ['./dataset/recognition/v2_rec_gt_eval.txt'] [2024/05/04 03:21:49] ppocr INFO: name : SimpleDataSet [2024/05/04 03:21:49] ppocr INFO: transforms : [2024/05/04 03:21:49] ppocr INFO: DecodeImage : [2024/05/04 03:21:49] ppocr INFO: channel_first : False [2024/05/04 03:21:49] ppocr INFO: img_mode : BGR [2024/05/04 03:21:49] ppocr INFO: SRNLabelEncode : None [2024/05/04 03:21:49] ppocr INFO: SRNRecResizeImg : [2024/05/04 03:21:49] ppocr INFO: image_shape : [1, 64, 256] [2024/05/04 03:21:49] ppocr INFO: KeepKeys : [2024/05/04 03:21:49] ppocr INFO: keep_keys : ['image', 'label', 'length', 'encoder_word_pos', 'gsrm_word_pos', 'gsrm_slf_attn_bias1', 'gsrm_slf_attn_bias2'] [2024/05/04 03:21:49] ppocr INFO: loader : [2024/05/04 03:21:49] ppocr INFO: batch_size_per_card : 64 [2024/05/04 03:21:49] ppocr INFO: drop_last : False [2024/05/04 03:21:49] ppocr INFO: num_workers : 4 [2024/05/04 03:21:49] ppocr INFO: shuffle : False [2024/05/04 03:21:49] ppocr INFO: Global : [2024/05/04 03:21:49] ppocr INFO: cal_metric_during_train : True [2024/05/04 03:21:49] ppocr INFO: character_dict_path : ./ppocr/utils/dict/latin_dict.txt [2024/05/04 03:21:49] ppocr INFO: checkpoints : None [2024/05/04 03:21:49] ppocr INFO: distributed : False [2024/05/04 03:21:49] ppocr INFO: epoch_num : 2 [2024/05/04 03:21:49] ppocr INFO: eval_batch_step : [0, 5000] [2024/05/04 03:21:49] ppocr INFO: infer_img : doc/imgs_words/ch/word_1.jpg [2024/05/04 03:21:49] ppocr INFO: infer_mode : False [2024/05/04 03:21:49] ppocr INFO: log_smooth_window : 20 [2024/05/04 03:21:49] ppocr INFO: max_text_length : 25 [2024/05/04 03:21:49] ppocr INFO: num_heads : 8 [2024/05/04 03:21:49] ppocr INFO: pretrained_model : ./pretrained_models/rec_r50_vd_srn_train/best_accuracy [2024/05/04 03:21:49] ppocr INFO: print_batch_step : 20 [2024/05/04 03:21:49] ppocr INFO: save_epoch_step : 1 [2024/05/04 03:21:49] ppocr INFO: save_inference_dir : None [2024/05/04 03:21:49] ppocr INFO: save_model_dir : ./output/rec_resnet50 [2024/05/04 03:21:49] ppocr INFO: save_res_path : ./output/rec/predicts_srn.txt [2024/05/04 03:21:49] ppocr INFO: use_gpu : False [2024/05/04 03:21:49] ppocr INFO: use_space_char : True [2024/05/04 03:21:49] ppocr INFO: use_visualdl : False [2024/05/04 03:21:49] ppocr INFO: Loss : [2024/05/04 03:21:49] ppocr INFO: name : SRNLoss [2024/05/04 03:21:49] ppocr INFO: Metric : [2024/05/04 03:21:49] ppocr INFO: main_indicator : acc [2024/05/04 03:21:49] ppocr INFO: name : RecMetric [2024/05/04 03:21:49] ppocr INFO: Optimizer : [2024/05/04 03:21:49] ppocr INFO: beta1 : 0.9 [2024/05/04 03:21:49] ppocr INFO: beta2 : 0.999 [2024/05/04 03:21:49] ppocr INFO: clip_norm : 10.0 [2024/05/04 03:21:49] ppocr INFO: lr : [2024/05/04 03:21:49] ppocr INFO: learning_rate : 0.0001 [2024/05/04 03:21:49] ppocr INFO: name : Adam [2024/05/04 03:21:49] ppocr INFO: PostProcess : [2024/05/04 03:21:49] ppocr INFO: name : SRNLabelDecode [2024/05/04 03:21:49] ppocr INFO: Train : [2024/05/04 03:21:49] ppocr INFO: dataset : [2024/05/04 03:21:49] ppocr INFO: data_dir : ./dataset/recognition/v2_img_train_rec/ [2024/05/04 03:21:49] ppocr INFO: label_file_list : ['./dataset/recognition/v2_rec_gt_train.txt'] [2024/05/04 03:21:49] ppocr INFO: name : SimpleDataSet [2024/05/04 03:21:49] ppocr INFO: transforms : [2024/05/04 03:21:49] ppocr INFO: DecodeImage : [2024/05/04 03:21:49] ppocr INFO: channel_first : False [2024/05/04 03:21:49] ppocr INFO: img_mode : BGR [2024/05/04 03:21:49] ppocr INFO: SRNLabelEncode : None [2024/05/04 03:21:49] ppocr INFO: SRNRecResizeImg : [2024/05/04 03:21:49] ppocr INFO: image_shape : [1, 64, 256] [2024/05/04 03:21:49] ppocr INFO: KeepKeys : [2024/05/04 03:21:49] ppocr INFO: keep_keys : ['image', 'label', 'length', 'encoder_word_pos', 'gsrm_word_pos', 'gsrm_slf_attn_bias1', 'gsrm_slf_attn_bias2'] [2024/05/04 03:21:49] ppocr INFO: loader : [2024/05/04 03:21:49] ppocr INFO: batch_size_per_card : 64 [2024/05/04 03:21:49] ppocr INFO: drop_last : False [2024/05/04 03:21:49] ppocr INFO: num_workers : 8 [2024/05/04 03:21:49] ppocr INFO: shuffle : False [2024/05/04 03:21:49] ppocr INFO: profiler_options : None [2024/05/04 03:21:49] ppocr INFO: train with paddle 2.5.0 and device Place(cpu) [2024/05/04 03:21:49] ppocr INFO: Initialize indexs of datasets:['./dataset/recognition/v2_rec_gt_train.txt'] list index out of range [2024/05/04 03:21:49] ppocr INFO: Initialize indexs of datasets:['./dataset/recognition/v2_rec_gt_eval.txt'] [2024/05/04 03:21:51] ppocr INFO: train dataloader has 136 iters [2024/05/04 03:21:51] ppocr INFO: valid dataloader has 34 iters [2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.gsrm.fc0.weight [512, 188] not matched with loaded params head.gsrm.fc0.weight [512, 38] ! [2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.gsrm.fc0.bias [188] not matched with loaded params head.gsrm.fc0.bias [38] ! [2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.gsrm.wrap_encoder0.prepare_decoder.emb0.weight [189, 512] not matched with loaded params head.gsrm.wrap_encoder0.prepare_decoder.emb0.weight [39, 512] !
[2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.gsrm.wrap_encoder1.prepare_decoder.emb0.weight [189, 512] not matched with loaded params head.gsrm.wrap_encoder1.prepare_decoder.emb0.weight [39, 512] !
[2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.vsfd.fc1.weight [512, 188] not matched with loaded params head.vsfd.fc1.weight [512, 38] ! [2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.vsfd.fc1.bias [188] not matched with loaded params head.vsfd.fc1.bias [38] ! [2024/05/04 03:21:53] ppocr INFO: load pretrain successful from ./pretrained_models/rec_r50_vd_srn_train/best_accuracy [2024/05/04 03:21:53] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 5000 iterations

Hello, I am trying to fine tune SRN recognition model of PaddleOCR and I get an error regarding the shape of the model params not matching with the loaded params. I followed this step with the configuration file being used and for downloading the model: https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/algorithm_rec_srn_en.md

I also checked issues regarding the yml file because in the current config file provided at the repo, its not working because it uses LMDBDataSet and must be changed to SimpleDataSet for it to work. I did what the others did, yet I'm still getting an error.

Any help or advice on what to do?

piarosebelledelapaz commented 1 month ago

The config file I used is and below is the details

Global: use_gpu: False epoch_num: 2 log_smooth_window: 20 print_batch_step: 20 save_model_dir: ./output/rec_resnet50 save_epoch_step: 1

evaluation is run every 5000 iterations after the 4000th iteration

eval_batch_step: [0, 5000] cal_metric_during_train: True pretrained_model: ./pretrained_models/rec_r50_vd_srn_train checkpoints: save_inference_dir: use_visualdl: False infer_img: doc/imgs_words/ch/word_1.jpg

for data or label process

character_dict_path: ./ppocr/utils/dict/latin_dict.txt max_text_length: 25 num_heads: 8 infer_mode: False use_space_char: True save_res_path: ./output/rec/predicts_srn.txt

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 clip_norm: 10.0 lr: learning_rate: 0.0001

Architecture: model_type: rec algorithm: SRN in_channels: 1 Transform: Backbone: name: ResNetFPN Head: name: SRNHead max_text_length: 25 num_heads: 8 num_encoder_TUs: 2 num_decoder_TUs: 4 hidden_dims: 512

Loss: name: SRNLoss

PostProcess: name: SRNLabelDecode

Metric: name: RecMetric main_indicator: acc

Train: dataset: name: SimpleDataSet data_dir: ./dataset/recognition/v2_img_train_rec/ label_file_list: ["./dataset/recognition/v2_rec_gt_train.txt"] transforms:

DecodeImage: # load image img_mode: BGR channel_first: False
SRNLabelEncode: # Class handling label
SRNRecResizeImg: image_shape: [1, 64, 256]
KeepKeys: keep_keys: ['image', 'label', 'length', 'encoder_word_pos', 'gsrm_word_pos', 'gsrm_slf_attn_bias1', 'gsrm_slf_attn_bias2'] # dataloader will return list in this order loader: shuffle: False batch_size_per_card: 64 drop_last: False num_workers: 8

Eval: dataset: name: SimpleDataSet data_dir: ./dataset/recognition/v2_img_eval_rec/ label_file_list: ["./dataset/recognition/v2_rec_gt_eval.txt"] transforms:

DecodeImage: img_mode: BGR channel_first: False
SRNLabelEncode:
SRNRecResizeImg: image_shape: [1, 64, 256]
KeepKeys: keep_keys: ['image', 'label', 'length', 'encoder_word_pos', 'gsrm_word_pos', 'gsrm_slf_attn_bias1', 'gsrm_slf_attn_bias2'] loader: shuffle: False drop_last: False batch_size_per_card: 64 num_workers: 4

zhangyubo0722 commented 1 month ago

Have you replaced the ./ppocr/utils/dict/latin_dict.txt dictionary file with a dictionary file for your own task?

piarosebelledelapaz commented 1 month ago

i am specifically using ./ppocr/utils/dict/latin_dict.txt for my task

zhangyubo0722 commented 1 month ago

If the length of your dictionary does not match the length of the pre-trained dictionary, it will result in dimension mismatch when loading the weights.

piarosebelledelapaz commented 1 month ago

so does this mean if i plan to use the latin dictionary, i have to train the model from scratch??

zhangyubo0722 commented 1 month ago

You can load a pretrained model and train it from scratch.

piarosebelledelapaz commented 1 month ago

i tried loading the pretrained model initially using the command:

python tools/train.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model=./pretrained_models/rec_r50_vd_srn_train/best_accuracy

and i get this error:

[2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.gsrm.wrap_encoder0.prepare_decoder.emb0.weight [189, 512] not matched with loaded params head.gsrm.wrap_encoder0.prepare_decoder.emb0.weight [39, 512] ! [2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.gsrm.wrap_encoder1.prepare_decoder.emb0.weight [189, 512] not matched with loaded params head.gsrm.wrap_encoder1.prepare_decoder.emb0.weight [39, 512] ! [2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.vsfd.fc1.weight [512, 188] not matched with loaded params head.vsfd.fc1.weight [512, 38] !

is there a way to train the model from scratch or am i using the wrong command?

i just noticed that this model does not have dictionary path set initially in the config. (https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/rec_r50_fpn_srn.yml) which pre-trained dictionary was this model trained on?

piarosebelledelapaz commented 1 month ago

also if i train the recognition model using resnet50 as backbone, it says from https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/algorithm_rec_srn_en.md

that the mode was trained using MJSynth and SynthText two text recognition datasets. if i plan to train the model from scratch, how did you do this? do you combine the two datasets then train the model or train with 1 dataset first, then retrain the 2nd dataset?

could you provide more info regarding this

zhangyubo0722 commented 1 month ago

You can ignore the warning of weight mismatch and use your own dictionary, which essentially means retraining the model. Specifically, you can refer to the following documentation:

https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/recognition_en.md

PaddlePaddle / PaddleOCR

SRN Recognition Model Training Warning --> ppocr WARNING: The shape of model params head.gsrm.wrap_encoder0.prepare_decoder.emb0.weight [189, 512] not matched with loaded params head.gsrm.wrap_encoder0.prepare_decoder.emb0.weight [39, 512] ! #12047

evaluation is run every 5000 iterations after the 4000th iteration

for data or label process