PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.19k stars 2.95k forks source link

[Question]: 为什么判断attn_mask是否为causal是通过比较上三角矩阵的方式? #8309

Closed runzhech closed 6 months ago

runzhech commented 7 months ago

请提出你的问题

PaddleNLP/paddlenlp/transformers /llama/modeling.py这个文件中提供了生成causal_mask以及判断mask是否为causal的方法。 causal_mask应当是一个下三角矩阵,但是这里判断mask是否为causal的方法为什么是取mask的上三角矩阵进行比较呢?

f83fe03bd860192e065f2c611df71304
w5688414 commented 7 months ago

这里为了适配zero-padding策略,上三角的值为负无穷,并且不在zero-padding(会改变下三角矩阵的值)的范围之内,通过上三角判断比较方便。