HawkAaron / warp-transducer

A fast parallel implementation of RNN Transducer.
Apache License 2.0
307 stars 124 forks source link

joint network defined in source code #12

Closed rzcwade closed 5 years ago

rzcwade commented 5 years ago

Hi @HawkAaron ,

I was trying to understand your rnnt loss implementation. Could you point me to in your C++ source code where you defined the output network (equation (16) (17) (18) from the 2013 paper: speech recognition with deep recurrent neural networks)?

Thanks!

HawkAaron commented 5 years ago

As I said in this issue, the activation can be derived by equation 16, 17. Equation 18 is softmax.

rzcwade commented 5 years ago

Hi @HawkAaron , thanks for your reply.

Please allow me to clarify this, so the process on my end would be: 1) encoder output (l_t) and decoder output (p_u) 2) add a dense layer for each output to match the feature dimensions 3) add a tanh activation layer to output (h_t_u) 4) add a dense layer to output (y_t_u) 5) send y_t_u to your rnnt loss function.

Is this correct?

Thanks for your patience!

HawkAaron commented 5 years ago

That's right. There is a detail for the activations: https://github.com/HawkAaron/warp-transducer/issues/9#issuecomment-432912539 For CPU version, you should call log_softmax(y_t_u) before sending to rnnt loss.

rzcwade commented 5 years ago

That's clear! Thanks!