转换hugging face 模型进行预训练出错，以及不转换直接预训练正确率太低

watermelon-lee commented 3 years ago

Q1: 转换模型预训练出错执行转换脚本 python3 scripts/convert_bert_from_huggingface_to_uer.py --input_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model.bin \ --output_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model_uer.bin \ --layers_num 24

在将数据（约 15MB语料）转换完毕之后，进行预训练

CUDA_VISIBLE_DEVICES=1,2 python3 pretrain.py --dataset_path dataset.pt --vocab_path models/google_zh_vocab.txt --pretrained_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model_uer.bin \ --output_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model_uer_mlm.bin --world_size 2 --gpu_ranks 0 1 \ --total_steps 5000 --save_checkpoint_steps 1000 --embedding bert --encoder bert --target mlm

然后出现报错： RuntimeError: Error(s) in loading state_dict for Model: size mismatch for embedding.word_embedding.weight: copying a param with shape torch.Size([21128, 1024]) from checkpoint, the shape in current model is torch.Size([21128, 768]). size mismatch for embedding.position_embedding.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([512, 768]). 疑似在模型转换的时候将bert large 转换出错了？

Q2: 不使用转换的预训练模型，直接使用hugging face 的模型进行预训练 CUDA_VISIBLE_DEVICES=1,2 python3 pretrain.py --dataset_path dataset.pt --vocab_path models/google_zh_vocab.txt --pretrained_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model.bin \ --output_model_path models/chinese_roberta_wwm_large_ext_pytorch/pytorch_model_uer_mlm.bin --world_size 2 --gpu_ranks 0 1 \ --total_steps 5000 --save_checkpoint_steps 1000 --embedding bert --encoder bert --target mlm

提前圣诞快乐～望回答！

watermelon-lee commented 3 years ago

还有一个现象，仍然是Q2中的，我使用hugging face的Roberta large直接进行预训练的话，保存的模型大小是 474MB，原模型是 1.3G

watermelon-lee commented 3 years ago

sorry 草率啦，发现pretrain.py 中有设置模型 config的参数，我忘了设置，使用的默认的 Bert base.config 十分抱歉，打扰啦！

dbiir / UER-py

转换hugging face 模型进行预训练出错，以及不转换直接预训练正确率太低 #86

dbiir / UER-py

转换hugging face 模型 进行预训练出错，以及 不转换直接预训练正确率太低 #86

转换hugging face 模型进行预训练出错，以及不转换直接预训练正确率太低 #86