Connectionist Temporal Classification (CTC) decoder with dictionary and Language Model (LM).
pip install .
tests/
and execute pytest
to check if installation workedThe following toy example shows how to use word beam search. The hypothetical model (e.g. a text recognition model) is able to recognize 3 different characters: "a", "b" and " " (whitespace). Words in that toy example can contain the characters "a" and "b" (but not " " which is the word separator). The language model is trained from a text corpus which contains only two words: "a" and "ba".
In this code snippet an instance of word beam search is created, and a TxBx(C+1) shaped numpy array is decoded:
import numpy as np
from word_beam_search import WordBeamSearch
corpus = 'a ba' # two words "a" and "ba", separated by whitespace
chars = 'ab ' # the characters that can be recognized (in this order)
word_chars = 'ab' # characters that form words
# RNN output
# 3 time-steps and 4 characters per time time ("a", "b", " ", CTC-blank)
mat = np.array([[[0.9, 0.1, 0.0, 0.0]],
[[0.0, 0.0, 0.0, 1.0]],
[[0.6, 0.4, 0.0, 0.0]]])
# initialize word beam search (only do this once in your code)
wbs = WordBeamSearch(25, 'Words', 0.0, corpus.encode('utf8'), chars.encode('utf8'), word_chars.encode('utf8'))
# compute label string
label_str = wbs.compute(mat)
The decoder returns a list with a decoded label string for each batch element. To finally get the character strings, map each label to its corresponding character:
char_str = [] # decoded texts for batch
for curr_label_str in label_str:
s = ''.join([chars[label] for label in curr_label_str])
char_str.append(s)
Examples:
tests/test_word_beam_search.py
Parameters of the constructor of the WordBeamSearch
class:
0<len(wordChars)<len(chars)
. In case only single words have to be detected, there is no need for a separating character, therefore the two parameters may also be equal: 0<len(wordChars)<=len(chars)
Input to the WordBeamSearch.compute
method:
Word beam search is a CTC decoding algorithm. It is used for sequence recognition tasks like handwritten text recognition or automatic speech recognition.
The four main properties of word beam search are:
The following sample shows a typical use-case of word beam search along with the results given by five different decoders. Best path decoding and vanilla beam search get the words wrong as these decoders only use the noisy output of the optical model. Extending vanilla beam search by a character-level LM improves the result by only allowing likely character sequences. Token passing uses a dictionary and a word-level LM and therefore gets all words right. However, it is not able to recognize arbitrary character strings like numbers. Word beam search is able to recognize the words by using a dictionary, but it is also able to correctly identify the non-word characters.
More information:
extras/prototype/
extras/tf/
Please cite the following paper if you are using word beam search in your research work.
@inproceedings{scheidl2018wordbeamsearch,
title = {Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm},
author = {Scheidl, H. and Fiel, S. and Sablatnig, R.},
booktitle = {16th International Conference on Frontiers in Handwriting Recognition},
pages = {253--258},
year = {2018},
organization = {IEEE}
}