In the tutorial, they calculate the score or weight using the decoder’s input and decoder's hidden state. However I find out neither Luong or Badahnau do that why. Instead both use the decoder hidden state and the ENCODER output the calculate the weight. Why Pytorch tutorial do that way?
Is yours or Pytorch are the right way?
I am studying the attention recent. I have some doubt about they calculate the attention is Pytorch NLP attention tutorial: https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html.
In the tutorial, they calculate the score or weight using the decoder’s input and decoder's hidden state. However I find out neither Luong or Badahnau do that why. Instead both use the decoder hidden state and the ENCODER output the calculate the weight. Why Pytorch tutorial do that way? Is yours or Pytorch are the right way?