ChrisFuscoMasters / TransformerLib

Facilitates my research
0 stars 0 forks source link

Decoder Masking #1

Open CJcool06 opened 1 year ago

CJcool06 commented 1 year ago

The decoder shouldn't be able to see future tokens in the sequence.

The current implementation holds a words variable in the MultiHeadAttention class that requires it be changed after each token prediction. There should be a better way to do this, and perhaps the variable should be moved to the Transformer class and be passed to MultiHeadAttention via function parameters.