Closed l2009312042 closed 3 years ago
The normalization is implemented only at the end of the decoder as proposed in the paper "Towards End-to-End Speech Recognition with Recurrent Neural Networks" from Graves.
But you can give it a try and see if it performs better when you normalize in each step. Just keep in mind to muliply the bigram probability onto the un-normalized probability.
i found the code in your CTCWordBeamSearch,the normalize use in each step ,such as follows,
if numWords >= 1 : beam.textual.prTotal = beam.textual.prTotal ** (1 / (numWords + 1))
,
i will have a try to compare the performance ,thanks.
It's a great job, thanks to the author.
i have a question in beam search +lm ,why last.norm() only use in the last step ? why not use last.norm() in every time step ? the long the seq the lm is small,so it should be compensate by length norm, i think it should be norm every time step ,is it right ? thanks in advance