Closed jinserk closed 5 years ago
Ah, I thought that the decoder_state
input has a dim of (batch_size, decoder_time_length, state_dim)
. but now I recognize it has (batch_size, 1, state_dim)
. Since statei = RNN(si-1, yi-1, ci-1), we cannot use all decoder_state
to obtain the context
. Sorry for my hasty question.
But even if the decoder_state
has a dim of (batch_size, 1, state_dim)
, isn't it enough to do context = torch.bmm(attention_score, listener_feature).squeeze(dim=1)
to obtain the context
? By quick modification and running, it looks to work as expected.
@jinserk Hello, did you get the code working for timit and librispeech? Please take a look at https://github.com/Alexander-H-Liu/Listen-Attend-and-Spell-Pytorch/issues/12. thank you!
@jinserk Sorry for the late reply... you're right, bmm
would be a more elegant/efficient way to do it, this is fixed during rebasing the project
Thanks again for your suggestion.
Hi @XenderLiu,
First of all, thanks for sharing this good project. I have a question for obtaining "context" in Attention class. According to the eq. (11) in the paper, context ci is a vector at time step i, which is the decoder timestep. So in my opinion, context should have a dimension of
(batch_size, decoder_time_length, listener_feature_dim)
. In that case,context = torch.bmm(attention_score, listener_feature)
could be enough. Isn't it? Please let me know if my understanding is wrong.Thank you! Jinserk