WordBeamSearch Decoder - Githubissues

githubharald / SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.

https://towardsdatascience.com/2326a3487cd5

MIT License

1.99k stars 894 forks source link

WordBeamSearch Decoder #48

Closed ritzyag closed 5 years ago

ritzyag commented 5 years ago

Please have a look at the FAQ section in the README - maybe your question is already answered there. Only issues concerning the repositories code will be answered. The following questions will not be answered:

How to convert dataset X into IAM format?
How to modify the model to recognize text-lines/more characters/...?
General/theoretical questions regarding (handwritten) text recognition.

If you create a new issue, please provide the following information:

Versions
- TensorFlow version
- Python version
- Operating system
Issue
- Which result/error did you get?
- If you think the result is wrong - what result did you expect instead?
- How to reproduce the issue?
- Provide all necessary data

ritzyag commented 5 years ago

I have trained the model from scratch to identify six digit numbers from images. I have also redefined the text file corpus.txt which contains the list of valid six-digit numbers that the model is allowed to output. Now, while performing validation using the flags --validate and --wordbeamsearch the model somehow predicts words which are not in corpus.txt. Including wordbeamsearch should restrict the output of my model only to dictionary words as defined in corpus.txt, but it does not happen. My model predicts four and five digit numbers also which is not a part of the corpus.

Why is this happening? Am I missing something?

Thank You :)

ritzyag commented 5 years ago

Also, why using the flag --wordbeamsearch during training significantly reduce the accuracy? --wordbeamsearch is used only as a decoder, isn't it? Why should it affect the training process?

githubharald commented 5 years ago

This is the way beam search works: it adds at most one character per iteration to a beam. This might cause a beam to have the last word not completed when the iteration stops. In your case, this means that the only word (number) might miss some digits.

You want to validate your neural network and not your language model while training, therefore better use best path decoding.

ritzyag commented 5 years ago

but there should be a final dictionary check that the beam search should perform, isn't it? And should output the word closest to which is found in the dictionary.

Also, can you share the link of the article which explains the working behind the wordbeamsearch decoder used in the code?

Thanks again ! :)

githubharald commented 5 years ago

if there is an unfinished word in the beam, it gets completed if this is possible, i.e. if there is no ambiguity. If you have a dictionary containing "1234", "1235", ... it is not clear which word to pick for an unfinished word "123". If you only want to keep beams with finished words, you have to change the code around here. Article is linked in references section of README.