Closed herbiezhao closed 2 years ago
The meaning of this code is: a batch uses the same mask.
Yes, it is consistent with the description in the paper.
If a batch uses different mask, does the code need to be modified?
The code needs to be modified to use different masks for different samples.
code in ContextualAttention
The meaning of this code is: a batch uses the same mask. If a batch uses different mask, does the code need to be modified?