Alexander-H-Liu / End-to-end-ASR-Pytorch

This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation) for end-to-end ASR implemented with Pytorch, the well known deep learning toolkit.
MIT License
1.19k stars 318 forks source link

Question for obtaining context in Attention class #8

Closed jinserk closed 5 years ago

jinserk commented 6 years ago

Hi @XenderLiu,

First of all, thanks for sharing this good project. I have a question for obtaining "context" in Attention class. According to the eq. (11) in the paper, context ci is a vector at time step i, which is the decoder timestep. So in my opinion, context should have a dimension of (batch_size, decoder_time_length, listener_feature_dim). In that case, context = torch.bmm(attention_score, listener_feature) could be enough. Isn't it? Please let me know if my understanding is wrong.

Thank you! Jinserk

jinserk commented 6 years ago

Ah, I thought that the decoder_state input has a dim of (batch_size, decoder_time_length, state_dim). but now I recognize it has (batch_size, 1, state_dim). Since statei = RNN(si-1, yi-1, ci-1), we cannot use all decoder_state to obtain the context. Sorry for my hasty question.

jinserk commented 6 years ago

But even if the decoder_state has a dim of (batch_size, 1, state_dim), isn't it enough to do context = torch.bmm(attention_score, listener_feature).squeeze(dim=1) to obtain the context? By quick modification and running, it looks to work as expected.

ghost commented 5 years ago

@jinserk Hello, did you get the code working for timit and librispeech? Please take a look at https://github.com/Alexander-H-Liu/Listen-Attend-and-Spell-Pytorch/issues/12. thank you!

Alexander-H-Liu commented 5 years ago

@jinserk Sorry for the late reply... you're right, bmm would be a more elegant/efficient way to do it, this is fixed during rebasing the project

Thanks again for your suggestion.