githubharald / CTCDecoder

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.
https://towardsdatascience.com/3797e43a86c
MIT License
817 stars 182 forks source link

Fixed transition to new words skipping probabilities #15

Closed ChWick closed 4 years ago

ChWick commented 4 years ago

I got several problems on rather trivial dictionaries and charsets (see below):

greedy

from itertools import groupby best_path = np.argmax(probabilities, axis=1) blank_idx = 0 best_charscollapsed = [classes[k] for k, in groupby(best_path) if k != blank_idx] res = ' '.join(best_chars_collapsed) print("Greedy-Decoder: ", res)

token passing

tp = ctcTokenPassing(probabilities, classes, dictionary, blankIdx=blank_idx) print("Token-Passing: ", tp)

import matplotlib.pyplot as plt plt.imshow(np.log(probabilities)) plt.show()



The problem is that the current code allows to skip a single output on the first transition (0->1,2) since `toks.get(wIdx, s - 1, t - 1)` is chosen with does not include the probability at `t`. Using `toks.get(wIdx, s - 1, t)` instead fixes this, because this includes the emission probability for `blank`.

There might be another crucial issue if a there is no blank between two words. I had no time to check this, though.

Appendix:
![probabilities](https://user-images.githubusercontent.com/6162410/70219628-7026e180-1745-11ea-9ded-5d33dd1704c4.png)

[charset.txt](https://github.com/githubharald/CTCDecoder/files/3925859/charset.txt)
[dictionary.txt](https://github.com/githubharald/CTCDecoder/files/3925860/dictionary.txt)
[propabilities.zip](https://github.com/githubharald/CTCDecoder/files/3925862/propabilities.zip)
githubharald commented 4 years ago

thank you for the detailed analysis. I hope that I will soon have time to look at it.

Best, Harald

githubharald commented 4 years ago

still didn't had the time to look at this in more detail. However, your analysis seems reasonable, so I merged it. Thanks for your contribution :+1: . For reasons of consistency I changed the blank index back to the (fixed) last position in the RNN output.