Unable to resume training after adding more characters to dictionary.

asif-ca commented 5 months ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

系统环境/System Environment：20.04.6 LTS
版本号/Version：Paddle：2.5.1 PaddleOCR：2.5 问题相关组件/Related components：
运行指令/Command Code：python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model=pretrained_models/rec/ch_PP-OCRv3_rec_train/best_accuracy.pdparams Global.checkpoints=output/rec_train_10_JAN_2024/iter_epoch_30
完整报错/Complete Error Message：

Variable Shape not match, Variable [ linear_8.w_0_moment1_0 ] need tensor with shape [64, 10962] but load set tensor with shape [64, 10270]

After I updated the dictionary for more languages and prepared data now when i resumed the training it was unable to resume showing me the this error and exit.

Variable Shape not match, Variable [ linear_8.w_0_moment1_0 ] need tensor with shape [64, 10962] but load set tensor with shape [64, 10270]

Can anyone confirm if it's possible to resume the training after adding more characters to the dictionary file or does it need to start from epoch 0 with your provided pre-trained rec model?

I am adding more languages to the model incrementally ... I don't want to start from scratch each time as I collect and add more languages, is there any approach I can use to resume training where I left off after adding more characters to the dictionary?

@shiyutang @cuicheng01 @andyjiang1116 any suggestions?

jzhang533 commented 5 months ago

Can anyone confirm if it's possible to resume the training after adding more characters to the dictionary file or does it need to start from epoch 0 with your provided pre-trained rec model?

you need to train the model from the beginning after adding more characters to the dictionary, since shape of computing layers will change when dictionary size changed.

a workaround I can think, would be using a large enough dictionary from the beginning, use some meaningless place holder characters in the dictionary, and replace them with your actual characters later.

asif-ca commented 5 months ago

@jzhang533 Thanks for your comment

1: What do you think will be the impact on the accuracy of the model's performance if meaningless characters are added to the dictionary and there is no data available for training of those characters?

2: If we reduce the dictionary size later when we are sure that only certain characters are needed for new languages, will it be possible to resume the training? The issue of mismatch will arise at this point as well? Because the previously trained model would have more characters, and now we have removed those meaningless characters.

I am afraid having a large dictionary size with more meaningless characters will affect the model performance in inference?

jzhang533 commented 5 months ago

1: What do you think will be the impact on the accuracy of the model's performance if meaningless characters are added to the dictionary and there is no data available for training of those characters?

The accuracy shouldn't be affected since the model weights for them will remain unchanged during training, as there's no training data for those meaningless characters.

2: If we reduce the dictionary size later when we are sure that only certain characters are needed for new languages, will it be possible to resume the training? The issue of mismatch will arise at this point as well? Because the previously trained model would have more characters, and now we have removed those meaningless characters.

I think you will need to keep the dictionary size stay the same all the time.

PaddlePaddle / PaddleOCR

Unable to resume training after adding more characters to dictionary. #11845