Kyubyong / transformer

A TensorFlow Implementation of the Transformer: Attention Is All You Need
Apache License 2.0
4.24k stars 1.29k forks source link

should the multihead_attention of decoder use causality=False? #61

Open Satan012 opened 5 years ago

gitfourteen commented 5 years ago

In my understanding, Harvard Annotated Transformer only apply _subsequentmask, so called causality here, to decoder self-attention part.