dbiir / UER-py

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
https://github.com/dbiir/UER-py/wiki
Apache License 2.0
3.01k stars 525 forks source link

更换vocab文件后提示word_embedding shape错误 #308

Open johnsongwx opened 2 years ago

johnsongwx commented 2 years ago

RuntimeError: Error(s) in loading state_dict for Classifier: size mismatch for embedding.word_embedding.weight: copying a param with shape torch.Size([30522, 768]) from checkpoint, the shape in current model is torch.Size([28895, 768]).

想麻烦问一下这个参数可以在哪里改吗?

Embedding commented 2 years ago

可以提供一下命令么? 应该是加载的模型和使用的词典不匹配

johnsongwx commented 2 years ago

python3 finetune/run_classifier.py --pretrained_model_path models/PubMedBERT/PubMedBERT.bin \ --vocab_path models/PubMedBERT/vocab.txt \ --config_path models/PubMedBERT/config.json \ --train_path datasets/PubMedQA/dev.tsv \ --dev_path datasets/PubMedQA/dev.tsv \ --epochs_num 1 --batch_size 32

这个模型文件是使用您们提供的转换脚本转换过的;config文件是直接使用Huggingface上面提供的,不知道是否也需要进行转换呢?我看到字段名称不一样。