Closed sdsawtelle closed 4 years ago
Hi,
seems that the neural network gives extremely low score for the ground truth text, e.g. if I feed the first image 920062 into the net, the probability of the text "920062" is 1e-45! Word beam search does not only select the most probable word, but it also puts "non-word-characters" between words, which are in your case all characters the NN can predict minus the digits. Better suited for choosing the single most probable word from a dictionary would be something like lexicon search. However, you need best-path decoding to already give you roughly the correct word (small edit distance of maybe 1 or 2), which is not the case in your example.
I would suggest retraining the neural network with some more realistic data, and then try lexicon search.
Thank you Harald for the response and for your hard work in creating/documenting these repos. We will try training (or at least tuning) with exclusively digit and digit string data - hopefully that will drive up the raw probabilities of the digits. We will also look into lexicon search for the decoder layer. I'll close this issue for now but continue adding comments if we make progress, that way it may point others in the right direction for similar use cases. Thank you again!
@sdsawtelle did you make any progress with your work? I am also trying to do a similar work. I want to recognize dates which are 19/10/1993
format.
@hasansalimkanmaz we did make some good progress. We have almost no real labeled data for our task. So we pulled a handful of open source single digit data sets like MNIST and wrote some code to synthesize data for our use case (6-digit strings) by pasting the single digit images together. We trained the model only on our synthetic data so that the vocabulary was purely digits rather than alphanumeric. We haven't tried a lexicon search layer yet but instead just applied some simple business rules to the final output instead. The accuracy we obtained on our very small test set of true labeled data was quite good.
thank you very much for your information @sdsawtelle
I compiled
TFWordBeamSearch.so
and succesfully incorporated it with wordbeamsearch as the decoderType in the simpleHTR model (using Tensorflow 1.3 for both projects).My use case is to recognize 6-digit strings that come from a restricted set of 10 possibilities (they are aircraft serial numbers). So to conduct wordbeamsearch, in the simpleHTR project I modified
data/corpus.txt
to contain only the 10 actual serial numbersmodel/wordCharList.txt
to contain only the digits 0 through 9I have a test set of 10 images of these 6-digit serial strings. When running the CRNN with the default best-path decoding, I naturally get a lot of the digits mislabeled as letters. I expected that wordbeamsearch decoding would improve the result, but now all the images are labeled by the CRNN as "." with probability 0.50333.
I am trying to understand if this is related to my trying to use all digits in my "words"?
Attached is an example panel showing a few of the input images and the decoded label that results from the default best path search.
corpus.txt wordCharList.txt