Closed piarosebelledelapaz closed 4 weeks ago
The config file I used is
Global: use_gpu: False epoch_num: 2 log_smooth_window: 20 print_batch_step: 20 save_model_dir: ./output/rec_resnet50 save_epoch_step: 1
eval_batch_step: [0, 5000] cal_metric_during_train: True pretrained_model: ./pretrained_models/rec_r50_vd_srn_train checkpoints: save_inference_dir: use_visualdl: False infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ./ppocr/utils/dict/latin_dict.txt max_text_length: 25 num_heads: 8 infer_mode: False use_space_char: True save_res_path: ./output/rec/predicts_srn.txt
Optimizer: name: Adam beta1: 0.9 beta2: 0.999 clip_norm: 10.0 lr: learning_rate: 0.0001
Architecture: model_type: rec algorithm: SRN in_channels: 1 Transform: Backbone: name: ResNetFPN Head: name: SRNHead max_text_length: 25 num_heads: 8 num_encoder_TUs: 2 num_decoder_TUs: 4 hidden_dims: 512
Loss: name: SRNLoss
PostProcess: name: SRNLabelDecode
Metric: name: RecMetric main_indicator: acc
Train: dataset: name: SimpleDataSet data_dir: ./dataset/recognition/v2_img_train_rec/ label_file_list: ["./dataset/recognition/v2_rec_gt_train.txt"] transforms:
Eval: dataset: name: SimpleDataSet data_dir: ./dataset/recognition/v2_img_eval_rec/ label_file_list: ["./dataset/recognition/v2_rec_gt_eval.txt"] transforms:
Have you replaced the ./ppocr/utils/dict/latin_dict.txt dictionary file with a dictionary file for your own task?
i am specifically using ./ppocr/utils/dict/latin_dict.txt for my task
If the length of your dictionary does not match the length of the pre-trained dictionary, it will result in dimension mismatch when loading the weights.
so does this mean if i plan to use the latin dictionary, i have to train the model from scratch??
You can load a pretrained model and train it from scratch.
i tried loading the pretrained model initially using the command:
python tools/train.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model=./pretrained_models/rec_r50_vd_srn_train/best_accuracy
and i get this error:
[2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.gsrm.wrap_encoder0.prepare_decoder.emb0.weight [189, 512] not matched with loaded params head.gsrm.wrap_encoder0.prepare_decoder.emb0.weight [39, 512] ! [2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.gsrm.wrap_encoder1.prepare_decoder.emb0.weight [189, 512] not matched with loaded params head.gsrm.wrap_encoder1.prepare_decoder.emb0.weight [39, 512] ! [2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.vsfd.fc1.weight [512, 188] not matched with loaded params head.vsfd.fc1.weight [512, 38] !
is there a way to train the model from scratch or am i using the wrong command?
i just noticed that this model does not have dictionary path set initially in the config. (https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/rec_r50_fpn_srn.yml) which pre-trained dictionary was this model trained on?
also if i train the recognition model using resnet50 as backbone, it says from https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/algorithm_rec_srn_en.md
that the mode was trained using MJSynth and SynthText two text recognition datasets. if i plan to train the model from scratch, how did you do this? do you combine the two datasets then train the model or train with 1 dataset first, then retrain the 2nd dataset?
could you provide more info regarding this
You can ignore the warning of weight mismatch and use your own dictionary, which essentially means retraining the model. Specifically, you can refer to the following documentation:
https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/recognition_en.md
请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
python tools/train.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model=./pretrained_models/rec_r50_vd_srn_train/best_accuracy
[2024/05/04 03:21:49] ppocr INFO: Architecture : [2024/05/04 03:21:49] ppocr INFO: Backbone : [2024/05/04 03:21:49] ppocr INFO: name : ResNetFPN [2024/05/04 03:21:49] ppocr INFO: Head : [2024/05/04 03:21:49] ppocr INFO: hidden_dims : 512 [2024/05/04 03:21:49] ppocr INFO: max_text_length : 25 [2024/05/04 03:21:49] ppocr INFO: name : SRNHead [2024/05/04 03:21:49] ppocr INFO: num_decoder_TUs : 4 [2024/05/04 03:21:49] ppocr INFO: num_encoder_TUs : 2 [2024/05/04 03:21:49] ppocr INFO: num_heads : 8 [2024/05/04 03:21:49] ppocr INFO: Transform : None [2024/05/04 03:21:49] ppocr INFO: algorithm : SRN [2024/05/04 03:21:49] ppocr INFO: in_channels : 1 [2024/05/04 03:21:49] ppocr INFO: model_type : rec [2024/05/04 03:21:49] ppocr INFO: Eval : [2024/05/04 03:21:49] ppocr INFO: dataset : [2024/05/04 03:21:49] ppocr INFO: data_dir : ./dataset/recognition/v2_img_eval_rec/ [2024/05/04 03:21:49] ppocr INFO: label_file_list : ['./dataset/recognition/v2_rec_gt_eval.txt'] [2024/05/04 03:21:49] ppocr INFO: name : SimpleDataSet [2024/05/04 03:21:49] ppocr INFO: transforms : [2024/05/04 03:21:49] ppocr INFO: DecodeImage : [2024/05/04 03:21:49] ppocr INFO: channel_first : False [2024/05/04 03:21:49] ppocr INFO: img_mode : BGR [2024/05/04 03:21:49] ppocr INFO: SRNLabelEncode : None [2024/05/04 03:21:49] ppocr INFO: SRNRecResizeImg : [2024/05/04 03:21:49] ppocr INFO: image_shape : [1, 64, 256] [2024/05/04 03:21:49] ppocr INFO: KeepKeys : [2024/05/04 03:21:49] ppocr INFO: keep_keys : ['image', 'label', 'length', 'encoder_word_pos', 'gsrm_word_pos', 'gsrm_slf_attn_bias1', 'gsrm_slf_attn_bias2'] [2024/05/04 03:21:49] ppocr INFO: loader : [2024/05/04 03:21:49] ppocr INFO: batch_size_per_card : 64 [2024/05/04 03:21:49] ppocr INFO: drop_last : False [2024/05/04 03:21:49] ppocr INFO: num_workers : 4 [2024/05/04 03:21:49] ppocr INFO: shuffle : False [2024/05/04 03:21:49] ppocr INFO: Global : [2024/05/04 03:21:49] ppocr INFO: cal_metric_during_train : True [2024/05/04 03:21:49] ppocr INFO: character_dict_path : ./ppocr/utils/dict/latin_dict.txt [2024/05/04 03:21:49] ppocr INFO: checkpoints : None [2024/05/04 03:21:49] ppocr INFO: distributed : False [2024/05/04 03:21:49] ppocr INFO: epoch_num : 2 [2024/05/04 03:21:49] ppocr INFO: eval_batch_step : [0, 5000] [2024/05/04 03:21:49] ppocr INFO: infer_img : doc/imgs_words/ch/word_1.jpg [2024/05/04 03:21:49] ppocr INFO: infer_mode : False [2024/05/04 03:21:49] ppocr INFO: log_smooth_window : 20 [2024/05/04 03:21:49] ppocr INFO: max_text_length : 25 [2024/05/04 03:21:49] ppocr INFO: num_heads : 8 [2024/05/04 03:21:49] ppocr INFO: pretrained_model : ./pretrained_models/rec_r50_vd_srn_train/best_accuracy [2024/05/04 03:21:49] ppocr INFO: print_batch_step : 20 [2024/05/04 03:21:49] ppocr INFO: save_epoch_step : 1 [2024/05/04 03:21:49] ppocr INFO: save_inference_dir : None [2024/05/04 03:21:49] ppocr INFO: save_model_dir : ./output/rec_resnet50 [2024/05/04 03:21:49] ppocr INFO: save_res_path : ./output/rec/predicts_srn.txt [2024/05/04 03:21:49] ppocr INFO: use_gpu : False [2024/05/04 03:21:49] ppocr INFO: use_space_char : True [2024/05/04 03:21:49] ppocr INFO: use_visualdl : False [2024/05/04 03:21:49] ppocr INFO: Loss : [2024/05/04 03:21:49] ppocr INFO: name : SRNLoss [2024/05/04 03:21:49] ppocr INFO: Metric : [2024/05/04 03:21:49] ppocr INFO: main_indicator : acc [2024/05/04 03:21:49] ppocr INFO: name : RecMetric [2024/05/04 03:21:49] ppocr INFO: Optimizer : [2024/05/04 03:21:49] ppocr INFO: beta1 : 0.9 [2024/05/04 03:21:49] ppocr INFO: beta2 : 0.999 [2024/05/04 03:21:49] ppocr INFO: clip_norm : 10.0 [2024/05/04 03:21:49] ppocr INFO: lr : [2024/05/04 03:21:49] ppocr INFO: learning_rate : 0.0001 [2024/05/04 03:21:49] ppocr INFO: name : Adam [2024/05/04 03:21:49] ppocr INFO: PostProcess : [2024/05/04 03:21:49] ppocr INFO: name : SRNLabelDecode [2024/05/04 03:21:49] ppocr INFO: Train : [2024/05/04 03:21:49] ppocr INFO: dataset : [2024/05/04 03:21:49] ppocr INFO: data_dir : ./dataset/recognition/v2_img_train_rec/ [2024/05/04 03:21:49] ppocr INFO: label_file_list : ['./dataset/recognition/v2_rec_gt_train.txt'] [2024/05/04 03:21:49] ppocr INFO: name : SimpleDataSet [2024/05/04 03:21:49] ppocr INFO: transforms : [2024/05/04 03:21:49] ppocr INFO: DecodeImage : [2024/05/04 03:21:49] ppocr INFO: channel_first : False [2024/05/04 03:21:49] ppocr INFO: img_mode : BGR [2024/05/04 03:21:49] ppocr INFO: SRNLabelEncode : None [2024/05/04 03:21:49] ppocr INFO: SRNRecResizeImg : [2024/05/04 03:21:49] ppocr INFO: image_shape : [1, 64, 256] [2024/05/04 03:21:49] ppocr INFO: KeepKeys : [2024/05/04 03:21:49] ppocr INFO: keep_keys : ['image', 'label', 'length', 'encoder_word_pos', 'gsrm_word_pos', 'gsrm_slf_attn_bias1', 'gsrm_slf_attn_bias2'] [2024/05/04 03:21:49] ppocr INFO: loader : [2024/05/04 03:21:49] ppocr INFO: batch_size_per_card : 64 [2024/05/04 03:21:49] ppocr INFO: drop_last : False [2024/05/04 03:21:49] ppocr INFO: num_workers : 8 [2024/05/04 03:21:49] ppocr INFO: shuffle : False [2024/05/04 03:21:49] ppocr INFO: profiler_options : None [2024/05/04 03:21:49] ppocr INFO: train with paddle 2.5.0 and device Place(cpu) [2024/05/04 03:21:49] ppocr INFO: Initialize indexs of datasets:['./dataset/recognition/v2_rec_gt_train.txt'] list index out of range [2024/05/04 03:21:49] ppocr INFO: Initialize indexs of datasets:['./dataset/recognition/v2_rec_gt_eval.txt'] [2024/05/04 03:21:51] ppocr INFO: train dataloader has 136 iters [2024/05/04 03:21:51] ppocr INFO: valid dataloader has 34 iters [2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.gsrm.fc0.weight [512, 188] not matched with loaded params head.gsrm.fc0.weight [512, 38] ! [2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.gsrm.fc0.bias [188] not matched with loaded params head.gsrm.fc0.bias [38] ! [2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.gsrm.wrap_encoder0.prepare_decoder.emb0.weight [189, 512] not matched with loaded params head.gsrm.wrap_encoder0.prepare_decoder.emb0.weight [39, 512] !
[2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.gsrm.wrap_encoder1.prepare_decoder.emb0.weight [189, 512] not matched with loaded params head.gsrm.wrap_encoder1.prepare_decoder.emb0.weight [39, 512] !
[2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.vsfd.fc1.weight [512, 188] not matched with loaded params head.vsfd.fc1.weight [512, 38] ! [2024/05/04 03:21:52] ppocr WARNING: The shape of model params head.vsfd.fc1.bias [188] not matched with loaded params head.vsfd.fc1.bias [38] ! [2024/05/04 03:21:53] ppocr INFO: load pretrain successful from ./pretrained_models/rec_r50_vd_srn_train/best_accuracy [2024/05/04 03:21:53] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 5000 iterations
Hello, I am trying to fine tune SRN recognition model of PaddleOCR and I get an error regarding the shape of the model params not matching with the loaded params. I followed this step with the configuration file being used and for downloading the model: https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/algorithm_rec_srn_en.md
I also checked issues regarding the yml file because in the current config file provided at the repo, its not working because it uses LMDBDataSet and must be changed to SimpleDataSet for it to work. I did what the others did, yet I'm still getting an error.
Any help or advice on what to do?