githubharald / CTCDecoder

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.
https://towardsdatascience.com/3797e43a86c
MIT License
817 stars 182 forks source link

Support for K-Gram LM where k > 2? #9

Closed Sushil-Thapa closed 5 years ago

Sushil-Thapa commented 5 years ago

I wanted to experiment with the LMs other than Bigram. Any suggestion on how to approach extending current codebase? or the probability of the last character conditioned on all previous character.

githubharald commented 5 years ago

the relevant part of the code can be found here: applyLM(...)

Further, you need to create a lookup-table containing the n-gram (e.g. n=3) probabilities in the class LanguageModel, similar to initCharBigrams(...).

Sushil-Thapa commented 5 years ago

Thank you for your reply. Yes, I've created the lookup for trigrams in LanguageModel.

Does it makes sense to get the second char(prior) for trigram by classes[parentBeam.labeling[-2] if parentBeam.labeling else classes.index(' ')] just like *labeling[-1]* for first char?

[third char now will be conditioned on first and second char]

githubharald commented 5 years ago

What I used for the case n=2 was a trick to handle the case that there is only one character in the beam-text. Then I just say: well, let's assume there is a whitespace as the first character (which somehow makes sense for written text).

You could also implement a fall back; let's say you use n-gram with n=3:

I would suggest reading the relevant chapters (especially "Language Modeling with N- grams") in the book "Speech and Language Processing" from Jurafsky. There you can multiple useful methods and algorithms which can be implemented into the language model of the decoder.