然后出现报错:
RuntimeError: Error(s) in loading state_dict for Model:
size mismatch for embedding.word_embedding.weight: copying a param with shape torch.Size([21128, 1024]) from checkpoint, the shape in current model is torch.Size([21128, 768]).
size mismatch for embedding.position_embedding.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([512, 768]).
疑似在模型转换的时候将bert large 转换出错了?
Q1: 转换模型预训练出错 执行转换脚本 python3 scripts/convert_bert_from_huggingface_to_uer.py --input_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model.bin \ --output_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model_uer.bin \ --layers_num 24
在将数据(约 15MB语料)转换完毕之后,进行预训练
CUDA_VISIBLE_DEVICES=1,2 python3 pretrain.py --dataset_path dataset.pt --vocab_path models/google_zh_vocab.txt --pretrained_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model_uer.bin \ --output_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model_uer_mlm.bin --world_size 2 --gpu_ranks 0 1 \ --total_steps 5000 --save_checkpoint_steps 1000 --embedding bert --encoder bert --target mlm
然后出现报错: RuntimeError: Error(s) in loading state_dict for Model: size mismatch for embedding.word_embedding.weight: copying a param with shape torch.Size([21128, 1024]) from checkpoint, the shape in current model is torch.Size([21128, 768]). size mismatch for embedding.position_embedding.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([512, 768]). 疑似在模型转换的时候将bert large 转换出错了?
Q2: 不使用转换的预训练模型,直接使用hugging face 的模型进行预训练 CUDA_VISIBLE_DEVICES=1,2 python3 pretrain.py --dataset_path dataset.pt --vocab_path models/google_zh_vocab.txt --pretrained_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model.bin \ --output_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model_uer_mlm.bin --world_size 2 --gpu_ranks 0 1 \ --total_steps 5000 --save_checkpoint_steps 1000 --embedding bert --encoder bert --target mlm
能够正确运行,但是模型正确率非常低,感觉和issue #32Can we use bert wwm pretrain model directly ? 情况很类似? 具体输出如下: | 100/ 5000 steps| 28313.10 tokens/s| loss 9.36| acc: 0.025 | 200/ 5000 steps| 28757.11 tokens/s| loss 8.19| acc: 0.034 | 300/ 5000 steps| 28903.85 tokens/s| loss 7.36| acc: 0.032 | 400/ 5000 steps| 29195.85 tokens/s| loss 6.85| acc: 0.045 | 500/ 5000 steps| 28838.23 tokens/s| loss 6.66| acc: 0.057 | 600/ 5000 steps| 28747.72 tokens/s| loss 6.58| acc: 0.062 | 700/ 5000 steps| 29120.63 tokens/s| loss 6.44| acc: 0.070 | 800/ 5000 steps| 28985.53 tokens/s| loss 6.40| acc: 0.080 | 900/ 5000 steps| 28426.76 tokens/s| loss 6.34| acc: 0.089 | 1000/ 5000 steps| 29011.21 tokens/s| loss 6.27| acc: 0.096 | 1100/ 5000 steps| 28754.28 tokens/s| loss 6.24| acc: 0.097 | 1200/ 5000 steps| 28884.74 tokens/s| loss 6.21| acc: 0.103 | 1300/ 5000 steps| 28976.05 tokens/s| loss 6.23| acc: 0.108 | 1400/ 5000 steps| 29083.27 tokens/s| loss 6.16| acc: 0.111 | 1500/ 5000 steps| 29082.03 tokens/s| loss 6.14| acc: 0.109 | 1600/ 5000 steps| 29118.99 tokens/s| loss 6.12| acc: 0.117 | 1700/ 5000 steps| 28007.29 tokens/s| loss 6.11| acc: 0.117 | 1800/ 5000 steps| 29114.06 tokens/s| loss 6.11| acc: 0.121 | 1900/ 5000 steps| 28985.75 tokens/s| loss 6.08| acc: 0.118 | 2000/ 5000 steps| 29076.29 tokens/s| loss 6.10| acc: 0.123 | 2100/ 5000 steps| 28764.05 tokens/s| loss 6.05| acc: 0.124 | 2200/ 5000 steps| 29038.47 tokens/s| loss 5.98| acc: 0.127 | 2300/ 5000 steps| 29029.31 tokens/s| loss 5.99| acc: 0.126 | 2400/ 5000 steps| 29082.48 tokens/s| loss 6.01| acc: 0.127
提前圣诞快乐~ 望回答!