Mask for `TransformerDecoder` in the end-to-end Transformer (chapter11_part04_sequence-to-sequence-learning.ipynb)

In the chapter11_part04_sequence-to-sequence-learning.ipynb, the TransformerDecoder receives the mask from the PositionalEmbedding layer of the target sequence:

x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(decoder_inputs)
x = TransformerDecoder(embed_dim, dense_dim, num_heads)(x, encoder_outputs)

Shouldn’t the mask be the one created from encoding the source sequence?

For example, I have seen that in this TF tutorial the mask from the source sequence is used instead.

Any clarification would be greatly appreciated.

fchollet / deep-learning-with-python-notebooks

Mask for `TransformerDecoder` in the end-to-end Transformer (chapter11_part04_sequence-to-sequence-learning.ipynb) #209