githubharald / CTCDecoder

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.
https://towardsdatascience.com/3797e43a86c
MIT License
817 stars 182 forks source link

in beamsearch.py,why last.norm() only use in the last step? #20

Closed l2009312042 closed 3 years ago

l2009312042 commented 3 years ago

It's a great job, thanks to the author.

i have a question in beam search +lm ,why last.norm() only use in the last step ? why not use last.norm() in every time step ? the long the seq the lm is small,so it should be compensate by length norm, i think it should be norm every time step ,is it right ? thanks in advance

githubharald commented 3 years ago

The normalization is implemented only at the end of the decoder as proposed in the paper "Towards End-to-End Speech Recognition with Recurrent Neural Networks" from Graves.

But you can give it a try and see if it performs better when you normalize in each step. Just keep in mind to muliply the bigram probability onto the un-normalized probability.

l2009312042 commented 3 years ago

i found the code in your CTCWordBeamSearch,the normalize use in each step ,such as follows, if numWords >= 1 : beam.textual.prTotal = beam.textual.prTotal ** (1 / (numWords + 1)) , i will have a try to compare the performance ,thanks.