[Question] Baichuan模型中的Attention模块为什么在train的时候没有用到attention_mask？

Required prerequisites

[X] I have read the documentation https://github.com/baichuan-inc/baichuan-7B/blob/HEAD/README.md.
[X] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
[X] Consider asking first in a Discussion.

Questions

项目中提到“”“整体模型基于标准的 Transformer 结构，我们采用了和 LLaMA 一样的模型设计”“” 然而我发现在modeling_baichuan.py中的Attention ，在train阶段没有用到attention_mask。但是llama中的Attention是用到了的。请问这是为什么呢？

Checklist

[X] I have provided all relevant and necessary information above.
[X] I have chosen a suitable title for this issue.

baichuan-inc / Baichuan-7B

[Question] Baichuan模型中的Attention模块为什么在train的时候没有用到attention_mask？ #111

Required prerequisites

Questions

Checklist