RuntimeError: Error(s) in loading state_dict for BertModel

您好！最近用您的pytorch框架加载了很多的预训练模型fintune自己的任务都成功了，但是在用albert族的model的时候却都没成功。报错如下：

$ python run.py --model albert_base_bright
Loading data...
401it [00:04, 96.21it/s]
140it [00:01, 101.19it/s]
135it [00:01, 86.25it/s]
Time usage: 0:00:07
Traceback (most recent call last):
  File "run.py", line 39, in <module>
    model = x.Model(config).to(config.device)
  File "F:\PycharmProjects\Bert-Chinese-Text-Classification-Pytorch-master\models\albert_base_bright.py", line 40, in __init__
    self.bert = BertModel.from_pretrained(config.bert_path,config=model_config)
  File "D:\anaconda3\lib\site-packages\pytorch_transformers\modeling_utils.py", line 594, in from_pretrained
    model.__class__.__name__, "\n\t".join(error_msgs)))
**RuntimeError: Error(s) in loading state_dict for BertModel:
        size mismatch for bert.embeddings.word_embeddings.weight: copying a param with shape torch.Size([21128, 128]) from checkpoint, the shape in current model is torch.Size([21128, 768]).**

其中albert_base_bright的的config.json如下：

{
  "attention_probs_dropout_prob": 0.0,
  "directionality": "bidi", 
  "hidden_act": "gelu", 
  "hidden_dropout_prob": 0.0,
  "hidden_size": 768,
  "embedding_size": 128,
  "initializer_range": 0.02, 
  "intermediate_size": 3072 ,
  "max_position_embeddings": 512, 
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3, 
  "pooler_size_per_head": 128, 
  "pooler_type": "first_token_transform", 
  "type_vocab_size": 2, 
  "vocab_size": 21128,
   "ln_type":"postln"

}

albert族的model只有albert_xxlarge_zh能够成功，albert_xxlarge_zh的json配置文件如下：

{
  "attention_probs_dropout_prob": 0,
  "hidden_act": "relu",
  "hidden_dropout_prob": 0,
  "embedding_size": 128,
  "hidden_size": 4096,
  "initializer_range": 0.01,
  "intermediate_size": 16384,
  "max_position_embeddings": 512,
  "num_attention_heads": 16,
  "num_hidden_layers": 12,
  "num_hidden_groups": 1,
  "net_structure_type": 0,
  "layers_to_keep": [],
  "gap_size": 0,
  "num_memory_blocks": 0,
  "inner_group_num": 1,
  "down_scale_factor": 1,
  "type_vocab_size": 2,
  "vocab_size": 21128
}

我在github上找到了这个 issues 于是我用了HuggingFace的pytorch_transfomers来加载模型:

from pytorch_transformers import BertModel, BertConfig,BertTokenizer
class Model(nn.Module):

    def __init__(self, config):
        super(Model, self).__init__()
        model_config = BertConfig.from_json_file(os.path.join(config.bert_path,'config.json'))
        self.bert = BertModel.from_pretrained(config.bert_path,config=model_config)
        for param in self.bert.parameters():
            param.requires_grad = True
        self.fc = nn.Linear(config.hidden_size, config.num_classes)

    def forward(self, x):
        context = x[0]  # 输入的句子
        mask = x[2]  # 对padding部分进行mask，和句子一个size，padding部分用0表示，如：[1, 1, 1, 1, 0, 0]
        _, pooled = self.bert(context, attention_mask=mask, output_all_encoded_layers=False)
        out = self.fc(pooled)
        return out

请问您遇到过这种报错么？这种情况是什么原因呢？是要用convert_to_pytorch那几个文件转换一下吗？望您有时间能够抽空回复，多谢！

649453932 / Bert-Chinese-Text-Classification-Pytorch

RuntimeError: Error(s) in loading state_dict for BertModel #55