airaria / TextBrewer

A PyTorch-based knowledge distillation toolkit for natural language processing
http://textbrewer.hfl-rc.com
Apache License 2.0
1.6k stars 239 forks source link

对BERT-wwm-ext进行蒸馏时遇到以下问题,代码已贴出 #30

Closed cgq0816 closed 3 years ago

cgq0816 commented 3 years ago
def distill_fit(self, train_df,dev_df,*args):
    self.get_model_config()

    if self.set_en_train:
        if not self.tokenizer:self.tokenizer = BertTokenizer.from_pretrained(self.en_vocab_path)
    else:
        if not self.tokenizer:self.tokenizer = BertTokenizer.from_pretrained(self.bert_path)

    #检测gpu是否可用
    device=PlatformUtils.get_device()
    logging.info('--------cuda device--------:%s'%(device))
    device='cuda:0'

    self.device = torch.device(device if device else "cpu")

    train_it,dev_it=self.transform(train_df,dev_df,*args)

    # Define models
    bert_config = BertConfig.from_json_file('bert/bert_config/bert_config.json')
    bert_config_T6 = BertConfig.from_json_file('bert/bert_config/bert_config_T6.json')

    bert_config.output_hidden_states = True
    bert_config_T6.output_hidden_states = True  

    bert_config.num_labels=self.num_labels
    bert_config_T6.num_labels=self.num_labels

    #teacher_model = BertForSequenceClassification(bert_config) #, num_labels = 2 self.bert_path
    teacher_model = BertForSequenceClassification.from_pretrained(self.bert_path, num_labels=self.num_labels)
    # Teacher should be initialized with pre-trained weights and fine-tuned on the downstream task.
    # For the demonstration purpose, we omit these steps here

    student_model = BertForSequenceClassification(bert_config_T6) #, num_labels = 2

    teacher_model.to(device=self.device)
    student_model.to(device=self.device)

    # Optimizer and learning rate scheduler
    optimizer = AdamW(student_model.parameters(), lr=1e-4)
    scheduler = None
    num_epochs = 30
    num_training_steps = len(train_it) * num_epochs

    scheduler_class = get_linear_schedule_with_warmup
    # arguments dict except 'optimizer'
    scheduler_args = {'num_warmup_steps':int(0.1*num_training_steps), 'num_training_steps':num_training_steps}

    # display model parameters statistics
    print("\nteacher_model's parametrers:")
    result, _ = textbrewer.utils.display_parameters(teacher_model,max_level=3)
    print (result)
    print("student_model's parametrers:")
    result, _ = textbrewer.utils.display_parameters(student_model,max_level=3)
    print (result)
    from functools import partial
    callback_fun = partial(self.validate, eval_dataset=dev_it, device=self.device) # fill other arguments
    # Initialize configurations and distiller
    train_config = TrainingConfig(device=self.device)
    distill_config = DistillationConfig(
        temperature=8,
        hard_label_weight=0,
        kd_loss_type='ce',
        probability_shift=False,
        is_caching_logits=True,
        intermediate_matches=[
            {"layer_T":0, "layer_S":0, "feature":"hidden", "loss":"hidden_mse", "weight":1}, 
           {"layer_T":2, "layer_S":1, "feature":"hidden", "loss":"hidden_mse", "weight":1}, 
           {"layer_T":4, "layer_S":2, "feature":"hidden", "loss":"hidden_mse", "weight":1}, 
           {"layer_T":6, "layer_S":3, "feature":"hidden", "loss":"hidden_mse", "weight":1}, 
           {"layer_T":8, "layer_S":4, "feature":"hidden", "loss":"hidden_mse", "weight":1}, 
           {"layer_T":10,"layer_S":5, "feature":"hidden", "loss":"hidden_mse", "weight":1}, 
           {"layer_T":12,"layer_S":6, "feature":"hidden", "loss":"hidden_mse", "weight":1}]
    )

    print ("train_config:")
    print (train_config)

    print ("distill_config:")
    print (distill_config)

    distiller = GeneralDistiller(
        train_config=train_config, distill_config = distill_config,
        model_T = teacher_model, model_S = student_model, 
        adaptor_T = self.simple_adaptor, adaptor_S = self.simple_adaptor)

    # Start distilling
    with distiller:
        distiller.train(optimizer,train_it, num_epochs=num_epochs, 
        scheduler_class=scheduler_class, scheduler_args = scheduler_args, callback=callback_fun) 

File "d:\Users\cgq\Anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 550, in call result = self.forward(*input, **kwargs) File "d:\Users\cgq\Anaconda3\Lib\site-packages\transformers\modeling_bert.py", line 211, in forward embeddings = inputs_embeds + position_embeddings + token_type_embeddings

builtins.RuntimeError: The size of tensor a (256) must match the size of tensor b (8) at non-singleton dimension 2

最终提示我是配置编码的维度和输入的维度不匹配,不知道是哪里出了问题? 按照example里的取数据的方法对数据进行tokenizer处理,如下: features=self.distill_tok_collate(df,labelmap)

Convert to Tensors and build dataset

    all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
    all_input_mask = torch.tensor([f.input_mask for f in features], dtype=torch.long)
    all_segment_ids = torch.tensor([f.segment_ids for f in features], dtype=torch.long)

    all_label_ids = torch.tensor([f.label_id for f in features], dtype=torch.long)

    dataset = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_label_ids)
    return dataset  
cgq0816 commented 3 years ago

说错了 是位置编码的维度和输入的维度不匹配 ,自己debug的信息 inputs_embeds.shape torch.Size([8, 1, 256, 768]) position_embeddings.shape torch.Size([8, 768]) token_type_embeddings.shape torch.Size([8, 1, 256, 768])

cgq0816 commented 3 years ago
    return {'input_ids': self.all_input_ids[index],
            'attention_mask': self.all_attention_mask[index],
            'labels': self.all_labels[index]}

输入改成字典形式的解决,但是遇到的新问题是 x.size() torch.Size([8, 1, 256, 768]) x.size() torch.Size([8, 1, 256, 12, 64]) def transpose_for_scores(self, x): new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size) x = x.view(*new_x_shape) return x.permute(0, 2, 1, 3)

File "C:\Users\cgq\AppData\Roaming\Python\Python36\site-packages\transformers\modeling_bert.py", line 206, in transpose_for_scores return x.permute(0, 2, 1, 3) RuntimeError: number of dims don't match in permute 经debug之后发现数据维度变成5维,这是什么原因导致的呢

airaria commented 3 years ago

问题解决了吗?看起来是输入顺序上出了问题。因为正常来说不会传入input_embeds,也不会有关于inputs_embeds的报错

File "d:\Users\cgq\Anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "d:\Users\cgq\Anaconda3\Lib\site-packages\transformers\modeling_bert.py", line 211, in forward
embeddings = inputs_embeds + position_embeddings + token_type_embeddings

builtins.RuntimeError: The size of tensor a (256) must match the size of tensor b (8) at non-singleton dimension 2

改成字典传值是能解决这个问题。 Example中的transformers版本较早,和后期版本中的模型参数顺序可能不同。只要注意dataset中的item顺序和forward参数顺序就能解决。

关于修改后的数据维数,你check下all_input_ids的shape?

airaria commented 3 years ago
    return {'input_ids': self.all_input_ids[index],
            'attention_mask': self.all_attention_mask[index],
            'labels': self.all_labels[index]}

输入改成字典形式的解决,但是遇到的新问题是

x.size()

torch.Size([8, 1, 256, 768])

x.size()

torch.Size([8, 1, 256, 12, 64])

def transpose_for_scores(self, x):

new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)

x = x.view(*new_x_shape)

return x.permute(0, 2, 1, 3)

File "C:\Users\cgq\AppData\Roaming\Python\Python36\site-packages\transformers\modeling_bert.py", line 206, in transpose_for_scores

return x.permute(0, 2, 1, 3)

RuntimeError: number of dims don't match in permute

经debug之后发现数据维度变成5维,这是什么原因导致的呢

问题解决了吗?看起来是输入顺序上出了问题。因为正常来说不会传入input_embeds,也不会有关于inputs_embeds的报错

File "d:\Users\cgq\Anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "d:\Users\cgq\Anaconda3\Lib\site-packages\transformers\modeling_bert.py", line 211, in forward
embeddings = inputs_embeds + position_embeddings + token_type_embeddings

builtins.RuntimeError: The size of tensor a (256) must match the size of tensor b (8) at non-singleton dimension 2

改成字典传值是能解决这个问题。 Example中的transformers版本较早,和后期版本中的模型参数顺序可能不同。只要注意dataset中的item顺序和forward参数顺序就能解决。

关于修改后的数据维数,你check下all_input_ids的shape?

cgq0816 commented 3 years ago
    return {'input_ids': self.all_input_ids[index],
            'attention_mask': self.all_attention_mask[index],
            'labels': self.all_labels[index]}

输入改成字典形式的解决,但是遇到的新问题是 x.size() torch.Size([8, 1, 256, 768]) x.size() torch.Size([8, 1, 256, 12, 64]) def transpose_for_scores(self, x): new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size) x = x.view(*new_x_shape) return x.permute(0, 2, 1, 3) File "C:\Users\cgq\AppData\Roaming\Python\Python36\site-packages\transformers\modeling_bert.py", line 206, in transpose_for_scores return x.permute(0, 2, 1, 3) RuntimeError: number of dims don't match in permute 经debug之后发现数据维度变成5维,这是什么原因导致的呢

问题解决了吗?看起来是输入顺序上出了问题。因为正常来说不会传入input_embeds,也不会有关于inputs_embeds的报错

File "d:\Users\cgq\Anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "d:\Users\cgq\Anaconda3\Lib\site-packages\transformers\modeling_bert.py", line 211, in forward
embeddings = inputs_embeds + position_embeddings + token_type_embeddings

builtins.RuntimeError: The size of tensor a (256) must match the size of tensor b (8) at non-singleton dimension 2

改成字典传值是能解决这个问题。 Example中的transformers版本较早,和后期版本中的模型参数顺序可能不同。只要注意dataset中的item顺序和forward参数顺序就能解决。

关于修改后的数据维数,你check下all_input_ids的shape?

解决了,输入问题,应该是transformers版本不兼容导致的,我用的是2.9.0

junrong1 commented 3 years ago

你好,能不能贴一下你的bert_config_T6,我遇到了和你相似的问题,不知道怎么解决,我把版本换成了2.9.0,也没有解决

cgq0816 commented 3 years ago

你好,蒸馏文件已在附件中,麻烦下载一下,使用HFL发布的RoBERTa-wwm-ext或者BERT-wwm-ext做的实验

------------------ 原始邮件 ------------------ 发件人: "airaria/TextBrewer" <notifications@github.com>; 发送时间: 2021年1月4日(星期一) 下午5:35 收件人: "airaria/TextBrewer"<TextBrewer@noreply.github.com>; 抄送: "云"<276851182@qq.com>;"State change"<state_change@noreply.github.com>; 主题: Re: [airaria/TextBrewer] 对BERT-wwm-ext进行蒸馏时遇到以下问题,代码已贴出 (#30)

你好,能不能贴一下你的bert_config_T6,我遇到了和你相似的问题,不知道怎么解决,我把版本换成了2.9.0,也没有解决

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

junrong1 commented 3 years ago
    return {'input_ids': self.all_input_ids[index],
            'attention_mask': self.all_attention_mask[index],
            'labels': self.all_labels[index]}

输入改成字典形式的解决,但是遇到的新问题是 x.size() torch.Size([8, 1, 256, 768]) x.size() torch.Size([8, 1, 256, 12, 64]) def transpose_for_scores(self, x): new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size) x = x.view(*new_x_shape) return x.permute(0, 2, 1, 3)

File "C:\Users\cgq\AppData\Roaming\Python\Python36\site-packages\transformers\modeling_bert.py", line 206, in transpose_for_scores return x.permute(0, 2, 1, 3) RuntimeError: number of dims don't match in permute 经debug之后发现数据维度变成5维,这是什么原因导致的呢

embeddings = inputs_embeds + position_embeddings + token_type_embeddings RuntimeError: The size of tensor a (30) must match the size of tensor b (512) at non-singleton dimension 1

inputs_embeds.size() = [512, 30, 768] # batch, seq_len, hid_dim position_embeddings.size() = [512, 768] token_type_embeddings = [512, 30, 768]

请问一下,这一段return加在哪里可以修复bug,这个维度匹配的问题我还是没能解决 我用的是hfl/roberta-wwm-ext

cgq0816 commented 3 years ago

嗯嗯,之前我也遇到过,transformers版本过高导致的,详细看我的transform怎么load数据的,我这边这个问题已经解决,附件可查看 ------------------ 原始邮件 ------------------ 发件人: "airaria/TextBrewer" <notifications@github.com>; 发送时间: 2021年1月4日(星期一) 晚上6:26 收件人: "airaria/TextBrewer"<TextBrewer@noreply.github.com>; 抄送: "云"<276851182@qq.com>;"State change"<state_change@noreply.github.com>; 主题: Re: [airaria/TextBrewer] 对BERT-wwm-ext进行蒸馏时遇到以下问题,代码已贴出 (#30)

  return {'input_ids': self.all_input_ids[index],             'attention_mask': self.all_attention_mask[index],             'labels': self.all_labels[index]}  

输入改成字典形式的解决,但是遇到的新问题是 x.size() torch.Size([8, 1, 256, 768]) x.size() torch.Size([8, 1, 256, 12, 64]) def transpose_for_scores(self, x): new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size) x = x.view(*new_x_shape) return x.permute(0, 2, 1, 3)

File "C:\Users\cgq\AppData\Roaming\Python\Python36\site-packages\transformers\modeling_bert.py", line 206, in transpose_for_scores return x.permute(0, 2, 1, 3) RuntimeError: number of dims don't match in permute 经debug之后发现数据维度变成5维,这是什么原因导致的呢

embeddings = inputs_embeds + position_embeddings + token_type_embeddings RuntimeError: The size of tensor a (30) must match the size of tensor b (512) at non-singleton dimension 1

inputs_embeds.size() = [512, 30, 768] # batch, seq_len, hid_dim position_embeddings.size() = [512, 768] token_type_embeddings = [512, 30, 768]

请问一下,这一段return加在哪里可以修复bug,这个维度匹配的问题我还是没能解决 我用的是hfl/roberta-wwm-ext

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

cgq0816 commented 3 years ago

您好,我想问一下这个问题解决了嘛?我这issue关闭了

------------------ 原始邮件 ------------------ 发件人: "云" <276851182@qq.com>; 发送时间: 2021年1月4日(星期一) 晚上7:38 收件人: "airaria/TextBrewer"<reply@reply.github.com>;

主题: 回复: [airaria/TextBrewer] 对BERT-wwm-ext进行蒸馏时遇到以下问题,代码已贴出 (#30)

嗯嗯,之前我也遇到过,transformers版本过高导致的,详细看我的transform怎么load数据的,我这边这个问题已经解决,附件可查看 ------------------ 原始邮件 ------------------ 发件人: "airaria/TextBrewer" <notifications@github.com>; 发送时间: 2021年1月4日(星期一) 晚上6:26 收件人: "airaria/TextBrewer"<TextBrewer@noreply.github.com>; 抄送: "云"<276851182@qq.com>;"State change"<state_change@noreply.github.com>; 主题: Re: [airaria/TextBrewer] 对BERT-wwm-ext进行蒸馏时遇到以下问题,代码已贴出 (#30)

  return {'input_ids': self.all_input_ids[index],             'attention_mask': self.all_attention_mask[index],             'labels': self.all_labels[index]}  

输入改成字典形式的解决,但是遇到的新问题是 x.size() torch.Size([8, 1, 256, 768]) x.size() torch.Size([8, 1, 256, 12, 64]) def transpose_for_scores(self, x): new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size) x = x.view(*new_x_shape) return x.permute(0, 2, 1, 3)

File "C:\Users\cgq\AppData\Roaming\Python\Python36\site-packages\transformers\modeling_bert.py", line 206, in transpose_for_scores return x.permute(0, 2, 1, 3) RuntimeError: number of dims don't match in permute 经debug之后发现数据维度变成5维,这是什么原因导致的呢

embeddings = inputs_embeds + position_embeddings + token_type_embeddings RuntimeError: The size of tensor a (30) must match the size of tensor b (512) at non-singleton dimension 1

inputs_embeds.size() = [512, 30, 768] # batch, seq_len, hid_dim position_embeddings.size() = [512, 768] token_type_embeddings = [512, 30, 768]

请问一下,这一段return加在哪里可以修复bug,这个维度匹配的问题我还是没能解决 我用的是hfl/roberta-wwm-ext

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.