Closed xiongjun19 closed 2 years ago
Hmm yes I think you misunderstood the viterbi function. It's really meant to simulate teacher forcing so you would call it with the full predictions and not in a greedy fashion. Also the viterbi will remove blanks so you won't be guaranteed an output for every input frame.
Hmm yes I think you misunderstood the viterbi function. It's really meant to simulate teacher forcing so you would call it with the full predictions and not in a greedy fashion. Also the viterbi will remove blanks so you won't be guaranteed an output for every input frame.
thank you very much, awni! I really like this project so much, as it's really fast and memory efficient. it would be greater if there's a high performance end to end decoding interface. because, I have implement a python one , which is too slow to use
By the way, there is an alternative implementation in k2, which is called rnnt_loss_simple
There are training code and decoding code for it.
Although a joiner network with only a simple adder can save memory, it leads to degradation in WER when used for ASR training according to our previous experience.
@csukuangfj the degradation in WER from using a simple joiner is minor in my experience. The benefit of the low memory implementation are
Overall I think the cost of the joiner is not worth the benefit. Though it would be nice to see a careful study there. However if the above situation does not apply (small token sets, short utterances, small batches) then you won't get much gain from the memory lite and faster version.
Hi, awni! thanks for your gred repo, I have a problem in How to use the decode interface : I have tried to use code like following: ` B, T, *_ = scores.size()
`
but I always got break at the first step