THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
39.96k stars 5.15k forks source link

[BUG/Help] <title>GLM的双向注意力是否有必要? #1463

Open ssgg-code opened 4 months ago

ssgg-code commented 4 months ago

Is there an existing issue for this?

Current Behavior

最近在学习GLM的理论知识,发现很多解释都是粗略的说GLM的双向attention能够帮助更好的理解上下文。但是和bert的MLM方式不同,glm把被mask的部分作为partB放在了partA的后面,在预训练的时候,也是用partB对应维度的输出去做loss的计算。那么partA部分的注意力是否是双向有起到作用吗? 还是说有做其他的预训练任务?

Expected Behavior

No response

Steps To Reproduce

none

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response