DA-southampton / Read_Bert_Code

Bert源码阅读与讲解(Pytorch版本)-以BERT文本分类代码为例子
631 stars 147 forks source link

如何理解BertSelfAttention中transpose_for_scores函数的逻辑? #4

Closed BlueSkyBubble closed 3 years ago

BlueSkyBubble commented 3 years ago

你好,请问在BertSelfAttention中,hidden_states经过Q、K、V三个矩阵后分别得到mixed_query_layermixed_key_layermixed_value_layer三个结果,问题是:这三个结果为什么都要经过transpose_for_scores函数处理?特别是transpose_for_scores函数中的new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)该如何理解?

或者换个问法:为什么通过new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)就可以实现多头?