Open bjourne opened 4 years ago
Yes, I believe so. These two masks are to mask out the padding tokens in the input sentence, see http://nlp.seas.harvard.edu/2018/04/03/attention.html#batches-and-masking
Oh, I see. But then it would be more efficient to only use one mask?
I've stared at these lines in your excellent tutorial for a while now:
enc_padding_mask
anddec_padding_mask
will always be equal. Is this intentional? It seems weird to create two different padding masks that are the same.