Update 2021: installable Python package
Python implementation of some common Connectionist Temporal Classification (CTC) decoding algorithms. A minimalistic language model is provided.
pip install .
tests/
and execute pytest
to check if installation workedHere is a minimalistic executable example:
import numpy as np
from ctc_decoder import best_path, beam_search
mat = np.array([[0.4, 0, 0.6], [0.4, 0, 0.6]])
chars = 'ab'
print(f'Best path: "{best_path(mat, chars)}"')
print(f'Beam search: "{beam_search(mat, chars)}"')
The output mat
(numpy array, softmax already applied) of the CTC-trained neural network is expected to have shape TxC
and is passed as the first argument to the decoders.
T is the number of time-steps, and C the number of characters (the CTC-blank is the last element).
The characters that can be predicted by the neural network are passed as the chars
string to the decoder.
Decoders return the decoded string.
Running the code outputs:
Best path: ""
Beam search: "a"
To see more examples on how to use the decoders,
please have a look at the scripts in the tests/
folder.
Beam search can optionally integrate a character-level language model. Text statistics (bigrams) are used by beam search to improve reading accuracy.
from ctc_decoder import beam_search, LanguageModel
# create language model instance from a (large) text
lm = LanguageModel('this is some text', chars)
# and use it in the beam search decoder
res = beam_search(mat, chars, lm=lm)
The lexicon search decoder computes a first approximation with best path decoding. Then, it uses a BK-tree to retrieve similar words, scores them and finally returns the best scoring word. The BK-tree is created by providing a list of dictionary words. A tolerance parameter defines the maximum edit distance from the query word to the returned dictionary words.
from ctc_decoder import lexicon_search, BKTree
# create BK-tree from a list of words
bk_tree = BKTree(['words', 'from', 'a', 'dictionary'])
# and use the tree in the lexicon search
res = lexicon_search(mat, chars, bk_tree, tolerance=2)
Some notes:
rnn_output
has shape TxBxC, with B the batch dimension
mat = rnn_output[:, 0, :]
Recommended decoders:
best_path
: best path (or greedy) decoder, the fastest of all algorithms, however, other decoders often perform betterbeam_search
: beam search decoder, optionally integrates a character-level language model, can be tuned via the beam width parameterlexicon_search
: lexicon search decoder, returns the best scoring word from a dictionaryOther decoders, from my experience not really suited for practical purposes, but might be used for experiments or research:
prefix_search
: prefix search decodertoken_passing
: token passing algorithmextras/
folder)This paper gives suggestions when to use best path decoding, beam search decoding and token passing.