Open thakursc1 opened 4 years ago
Same question.
Also, the encoder outputs are passed while in the blog post, it's mentioned:
The first one is the dot scoring function. This is the simplest of the functions; to produce the alignment score we only need to take the hidden states of the encoder and multiply them by the hidden state of the decoder.
I see a potential disconnect here.
Same question.
Also, the encoder outputs are passed while in the blog post, it's mentioned:
The first one is the dot scoring function. This is the simplest of the functions; to produce the alignment score we only need to take the hidden states of the encoder and multiply them by the hidden state of the decoder.
I see a potential disconnect here.
Hi @sayakpaul, sorry for the confusion, perhaps the naming convention may be misleading.
According to the docs, the PyTorch LSTM cell produces (output, (h_n, c_n))
:
output
is defined as the: output features (h_t) from the last layer of the LSTM, for each t
.
This means that the output
from the forward call of the LSTM holds the hidden state at every time step of the input sequence. In our case, the encoder_outputs
are actually the hidden states of every time step.h_n
is the hidden state for t = seq_len
This indicates that the h_n
output only holds the hidden state of the last time step.Therefore when we call encoder_outputs, (h_n, c_n) = encoder(inp, h)
, we'll see that the outputs are of shapes:
encoder_ouputs
: torch.Size([1, 4, 256])
h_n
: torch.Size([1, 1, 256])
and if we run torch.equal(encoder_outputs[:, -1, :], h_n[0])
, it will return true, which confirms that the last time encoder_outputs
is essentially the same hidden state as h_n
.
In our case, we are interested in the hidden states of the encoder LSTM at every time step when calculating our attention weights, therefore we'll have to use the output
from the LSTM cell (which is essentially the hidden state of every time step) rather than the hidden
.
If I understand the logic correctly then in Luong Decoders forward function:
alignment_scores = self.attention(lstm_out,encoder_outputs)
Shouldn't we pass
hidden
instead oflstm_output
toself.attention
?
Hi @thakursc1, I apologise for the confusion as well.
Yes, the hidden state out the LSTM should be passed into the attention module and I've modified my code to be self.attention(hidden[0], encoder_outputs)
. I believe the reason that it still worked was that the lstm_output
is essentially the hidden states of every time step of the LSTM as explained above, and since the sequence length passed into the LSTM is 1
, the lstm_out
and hidden
carry the same values. (If we run torch.equal(hidden[0], lstm_out)
in the forward call of the Luong Decoder, we'll see that the values are the same).
Thanks for clarifying it with such details, @gabrielloye. If you could modify your already great blog post with these commentaries, I think it would be more complete.
If I understand the logic correctly then in Luong Decoders forward function:
Shouldn't we pass
hidden
instead oflstm_output
toself.attention
?