Closed triton99 closed 3 months ago
Hi @X-Lai , Thanks for sharing this great work!
What is the purpose of attn_masks in your transformer decoder? In your paper, you mentioned that mask-attention-free-transformer.
https://github.com/dvlab-research/Mask-Attention-Free-Transformer/blob/4b5048c0e08c2fc42f660dfea3209043179ace1b/maft/model/transformer.py#L108-L115
Thank you.
If I understand correctly, only the first cross-attention layer is not masked, all subsequent layers are.
Hi @X-Lai , Thanks for sharing this great work!
What is the purpose of attn_masks in your transformer decoder? In your paper, you mentioned that mask-attention-free-transformer.
https://github.com/dvlab-research/Mask-Attention-Free-Transformer/blob/4b5048c0e08c2fc42f660dfea3209043179ace1b/maft/model/transformer.py#L108-L115
Thank you.