Closed rzcwade closed 5 years ago
As I said in this issue, the activation can be derived by equation 16, 17. Equation 18 is softmax.
Hi @HawkAaron , thanks for your reply.
Please allow me to clarify this, so the process on my end would be: 1) encoder output (l_t) and decoder output (p_u) 2) add a dense layer for each output to match the feature dimensions 3) add a tanh activation layer to output (h_t_u) 4) add a dense layer to output (y_t_u) 5) send y_t_u to your rnnt loss function.
Is this correct?
Thanks for your patience!
That's right. There is a detail for the activations: https://github.com/HawkAaron/warp-transducer/issues/9#issuecomment-432912539 For CPU version, you should call log_softmax(y_t_u) before sending to rnnt loss.
That's clear! Thanks!
Hi @HawkAaron ,
I was trying to understand your rnnt loss implementation. Could you point me to in your C++ source code where you defined the output network (equation (16) (17) (18) from the 2013 paper: speech recognition with deep recurrent neural networks)?
Thanks!