[BUG/Help] <title>GLM的双向注意力是否有必要？

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

最近在学习GLM的理论知识，发现很多解释都是粗略的说GLM的双向attention能够帮助更好的理解上下文。但是和bert的MLM方式不同，glm把被mask的部分作为partB放在了partA的后面，在预训练的时候，也是用partB对应维度的输出去做loss的计算。那么partA部分的注意力是否是双向有起到作用吗？还是说有做其他的预训练任务？

Expected Behavior

No response

Steps To Reproduce

none

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

THUDM / ChatGLM-6B

[BUG/Help] <title>GLM的双向注意力是否有必要？ #1463

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?