THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
40.47k stars 5.19k forks source link

为什么要扩展keylayer,valuelayer的维度? #1361

Open lg920810 opened 1 year ago

lg920810 commented 1 year ago

Is there an existing issue for this?

Current Behavior

是CHatGLM2-6B的代码 有人能帮忙解释下为什么要做维度的扩展吗? if self.multi_query_attention: key_layer = key_layer.unsqueeze(-2) key_layer = key_layer.expand( -1, -1, -1, self.num_attention_heads_per_partition // self.num_multi_query_groups_per_partition, -1 ) key_layer = key_layer.contiguous().view( key_layer.size()[:2] + (self.num_attention_heads_per_partition, self.hidden_size_per_attention_head) ) value_layer = value_layer.unsqueeze(-2) value_layer = value_layer.expand( -1, -1, -1, self.num_attention_heads_per_partition // self.num_multi_query_groups_per_partition, -1 ) value_layer = value_layer.contiguous().view( value_layer.size()[:2] + (self.num_attention_heads_per_partition, self.hidden_size_per_attention_head) )

Expected Behavior

No response

Steps To Reproduce

None

Environment

- OS: Ubuntu18.04
- Python:3.8
- Transformers:
- PyTorch:2.0.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response