Decoder Masking - Githubissues

The decoder shouldn't be able to see future tokens in the sequence.

The current implementation holds a words variable in the MultiHeadAttention class that requires it be changed after each token prediction. There should be a better way to do this, and perhaps the variable should be moved to the Transformer class and be passed to MultiHeadAttention via function parameters.

ChrisFuscoMasters / TransformerLib

Decoder Masking #1