Possible bug in the padding mask handling

bjourne commented 4 years ago

I've stared at these lines in your excellent tutorial for a while now:

  enc_padding_mask = tf.keras.layers.Lambda(
      create_padding_mask, output_shape=(1, 1, None),
      name='enc_padding_mask')(inputs)
  # mask the future tokens for decoder inputs at the 1st attention block
  look_ahead_mask = tf.keras.layers.Lambda(
      create_look_ahead_mask,
      output_shape=(1, None, None),
      name='look_ahead_mask')(dec_inputs)
  # mask the encoder outputs for the 2nd attention block
  dec_padding_mask = tf.keras.layers.Lambda(
      create_padding_mask, output_shape=(1, 1, None),
      name='dec_padding_mask')(inputs)

enc_padding_mask and dec_padding_mask will always be equal. Is this intentional? It seems weird to create two different padding masks that are the same.

bryanlimy commented 4 years ago

Yes, I believe so. These two masks are to mask out the padding tokens in the input sentence, see http://nlp.seas.harvard.edu/2018/04/03/attention.html#batches-and-masking

bjourne commented 4 years ago

Oh, I see. But then it would be more efficient to only use one mask?

bryanlimy / tf2-transformer-chatbot

Possible bug in the padding mask handling #14