Open mingliangzhang2018 opened 3 years ago
Excuse me, could you tell me that the attn_mask for Masked Multi-Head Attention only use sequence mask but ignore the padding mask? To my konwledge, padding mask is necessay for eliminating the effect of padding.
Excuse me, could you tell me that the attn_mask for Masked Multi-Head Attention only use sequence mask but ignore the padding mask? To my konwledge, padding mask is necessay for eliminating the effect of padding.