dbiir / UER-py

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
https://github.com/dbiir/UER-py/wiki
Apache License 2.0
2.99k stars 528 forks source link

转换hugging face 模型 进行预训练出错,以及 不转换直接预训练正确率太低 #86

Closed watermelon-lee closed 3 years ago

watermelon-lee commented 3 years ago

Q1: 转换模型预训练出错 执行转换脚本 python3 scripts/convert_bert_from_huggingface_to_uer.py --input_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model.bin \ --output_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model_uer.bin \ --layers_num 24

在将数据(约 15MB语料)转换完毕之后,进行预训练

CUDA_VISIBLE_DEVICES=1,2 python3 pretrain.py --dataset_path dataset.pt --vocab_path models/google_zh_vocab.txt --pretrained_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model_uer.bin \ --output_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model_uer_mlm.bin --world_size 2 --gpu_ranks 0 1 \ --total_steps 5000 --save_checkpoint_steps 1000 --embedding bert --encoder bert --target mlm

然后出现报错: RuntimeError: Error(s) in loading state_dict for Model: size mismatch for embedding.word_embedding.weight: copying a param with shape torch.Size([21128, 1024]) from checkpoint, the shape in current model is torch.Size([21128, 768]). size mismatch for embedding.position_embedding.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([512, 768]). 疑似在模型转换的时候将bert large 转换出错了?

Q2: 不使用转换的预训练模型,直接使用hugging face 的模型进行预训练 CUDA_VISIBLE_DEVICES=1,2 python3 pretrain.py --dataset_path dataset.pt --vocab_path models/google_zh_vocab.txt --pretrained_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model.bin \ --output_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model_uer_mlm.bin --world_size 2 --gpu_ranks 0 1 \ --total_steps 5000 --save_checkpoint_steps 1000 --embedding bert --encoder bert --target mlm

能够正确运行,但是模型正确率非常低,感觉和issue #32Can we use bert wwm pretrain model directly ? 情况很类似? 具体输出如下: | 100/ 5000 steps| 28313.10 tokens/s| loss 9.36| acc: 0.025 | 200/ 5000 steps| 28757.11 tokens/s| loss 8.19| acc: 0.034 | 300/ 5000 steps| 28903.85 tokens/s| loss 7.36| acc: 0.032 | 400/ 5000 steps| 29195.85 tokens/s| loss 6.85| acc: 0.045 | 500/ 5000 steps| 28838.23 tokens/s| loss 6.66| acc: 0.057 | 600/ 5000 steps| 28747.72 tokens/s| loss 6.58| acc: 0.062 | 700/ 5000 steps| 29120.63 tokens/s| loss 6.44| acc: 0.070 | 800/ 5000 steps| 28985.53 tokens/s| loss 6.40| acc: 0.080 | 900/ 5000 steps| 28426.76 tokens/s| loss 6.34| acc: 0.089 | 1000/ 5000 steps| 29011.21 tokens/s| loss 6.27| acc: 0.096 | 1100/ 5000 steps| 28754.28 tokens/s| loss 6.24| acc: 0.097 | 1200/ 5000 steps| 28884.74 tokens/s| loss 6.21| acc: 0.103 | 1300/ 5000 steps| 28976.05 tokens/s| loss 6.23| acc: 0.108 | 1400/ 5000 steps| 29083.27 tokens/s| loss 6.16| acc: 0.111 | 1500/ 5000 steps| 29082.03 tokens/s| loss 6.14| acc: 0.109 | 1600/ 5000 steps| 29118.99 tokens/s| loss 6.12| acc: 0.117 | 1700/ 5000 steps| 28007.29 tokens/s| loss 6.11| acc: 0.117 | 1800/ 5000 steps| 29114.06 tokens/s| loss 6.11| acc: 0.121 | 1900/ 5000 steps| 28985.75 tokens/s| loss 6.08| acc: 0.118 | 2000/ 5000 steps| 29076.29 tokens/s| loss 6.10| acc: 0.123 | 2100/ 5000 steps| 28764.05 tokens/s| loss 6.05| acc: 0.124 | 2200/ 5000 steps| 29038.47 tokens/s| loss 5.98| acc: 0.127 | 2300/ 5000 steps| 29029.31 tokens/s| loss 5.99| acc: 0.126 | 2400/ 5000 steps| 29082.48 tokens/s| loss 6.01| acc: 0.127

提前圣诞快乐~ 望回答!

watermelon-lee commented 3 years ago

还有一个现象,仍然是Q2中的, 我使用hugging face的Roberta large直接进行预训练的话, 保存的模型大小是 474MB,原模型是 1.3G

watermelon-lee commented 3 years ago

sorry 草率啦, 发现pretrain.py 中有设置 模型 config的参数,我忘了设置,使用的默认的 Bert base.config 十分抱歉,打扰啦!