githubharald / CTCWordBeamSearch

Connectionist Temporal Classification (CTC) decoder with dictionary and language model.
https://towardsdatascience.com/b051d28f3d2e
MIT License
557 stars 160 forks source link
ctc decoder handwritten-text-recognition language-model recurrent-neural-networks speech-recognition text-recognition

CTC Word Beam Search Decoding Algorithm

Connectionist Temporal Classification (CTC) decoder with dictionary and Language Model (LM).

Installation

Usage

The following toy example shows how to use word beam search. The hypothetical model (e.g. a text recognition model) is able to recognize 3 different characters: "a", "b" and " " (whitespace). Words in that toy example can contain the characters "a" and "b" (but not " " which is the word separator). The language model is trained from a text corpus which contains only two words: "a" and "ba".

In this code snippet an instance of word beam search is created, and a TxBx(C+1) shaped numpy array is decoded:

import numpy as np
from word_beam_search import WordBeamSearch

corpus = 'a ba'  # two words "a" and "ba", separated by whitespace
chars = 'ab '  # the characters that can be recognized (in this order)
word_chars = 'ab'  # characters that form words

# RNN output
# 3 time-steps and 4 characters per time time ("a", "b", " ", CTC-blank)
mat = np.array([[[0.9, 0.1, 0.0, 0.0]], 
                [[0.0, 0.0, 0.0, 1.0]],
                [[0.6, 0.4, 0.0, 0.0]]]) 

# initialize word beam search (only do this once in your code)
wbs = WordBeamSearch(25, 'Words', 0.0, corpus.encode('utf8'), chars.encode('utf8'), word_chars.encode('utf8'))

# compute label string
label_str = wbs.compute(mat)

The decoder returns a list with a decoded label string for each batch element. To finally get the character strings, map each label to its corresponding character:

char_str = []  # decoded texts for batch
for curr_label_str in label_str:
    s = ''.join([chars[label] for label in curr_label_str])
    char_str.append(s)

Examples:

Documentation of parameters

Parameters of the constructor of the WordBeamSearch class:

Input to the WordBeamSearch.compute method:

Algorithm

Word beam search is a CTC decoding algorithm. It is used for sequence recognition tasks like handwritten text recognition or automatic speech recognition.

context

The four main properties of word beam search are:

The following sample shows a typical use-case of word beam search along with the results given by five different decoders. Best path decoding and vanilla beam search get the words wrong as these decoders only use the noisy output of the optical model. Extending vanilla beam search by a character-level LM improves the result by only allowing likely character sequences. Token passing uses a dictionary and a word-level LM and therefore gets all words right. However, it is not able to recognize arbitrary character strings like numbers. Word beam search is able to recognize the words by using a dictionary, but it is also able to correctly identify the non-word characters.

comparison

More information:

Extras

Citation

Please cite the following paper if you are using word beam search in your research work.

@inproceedings{scheidl2018wordbeamsearch,
    title = {Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm},
    author = {Scheidl, H. and Fiel, S. and Sablatnig, R.},
    booktitle = {16th International Conference on Frontiers in Handwriting Recognition},
    pages = {253--258},
    year = {2018},
    organization = {IEEE}
}

References