Different beam search output using different blankIdx value

githubharald / CTCDecoder

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

MIT License

817 stars 182 forks source link

Hi, I have a question regarding your beam search implementation.

On your ctcBeamSearch method, you put value on blankIdx equals to length of the classes (in this case, the known letters and symbols). But on some other beam-search implementation, they put zero on it.

I tested this using your example, and indeed it differs both in decoded result and how far is it from the ground truth (i'm using CER and WER)

=====Line example (using blankIdx = len(classes))=====
TARGET                  : "the fake friend of the family, like the"
BEAM SEARCH             : "the fak friend of the fomcly hae tC" CER: CER/WER: 0.25714/0.15000
BEAM SEARCH LM          : "the fake friend of the family, lie th" CER: CER/WER: 0.05405/0.03226
=====Line example (using blankIdx=0)=====
TARGET                  : "the fake friend of the family, like the"
BEAM SEARCH             : "the faetker friend of ther foarmnacly,  harse. tHhC." CER: CER/WER: 0.33333/0.22368
BEAM SEARCH LM          : "the fake friend of the family, like the " CER: CER/WER: 0.00000/0.00000

So is there a different case where the blankIdx is not zero? Which value is suitable for beam search decoding?

githubharald / CTCDecoder

Different beam search output using different blankIdx value #19