Closed ritzyag closed 5 years ago
I have trained the model from scratch to identify six digit numbers from images. I have also redefined the text file corpus.txt which contains the list of valid six-digit numbers that the model is allowed to output. Now, while performing validation using the flags --validate and --wordbeamsearch the model somehow predicts words which are not in corpus.txt. Including wordbeamsearch should restrict the output of my model only to dictionary words as defined in corpus.txt, but it does not happen. My model predicts four and five digit numbers also which is not a part of the corpus.
Why is this happening? Am I missing something?
Thank You :)
Also, why using the flag --wordbeamsearch during training significantly reduce the accuracy? --wordbeamsearch is used only as a decoder, isn't it? Why should it affect the training process?
This is the way beam search works: it adds at most one character per iteration to a beam. This might cause a beam to have the last word not completed when the iteration stops. In your case, this means that the only word (number) might miss some digits.
You want to validate your neural network and not your language model while training, therefore better use best path decoding.
but there should be a final dictionary check that the beam search should perform, isn't it? And should output the word closest to which is found in the dictionary.
Also, can you share the link of the article which explains the working behind the wordbeamsearch decoder used in the code?
Thanks again ! :)
if there is an unfinished word in the beam, it gets completed if this is possible, i.e. if there is no ambiguity. If you have a dictionary containing "1234", "1235", ... it is not clear which word to pick for an unfinished word "123". If you only want to keep beams with finished words, you have to change the code around here. Article is linked in references section of README.
Please have a look at the FAQ section in the README - maybe your question is already answered there. Only issues concerning the repositories code will be answered. The following questions will not be answered:
If you create a new issue, please provide the following information:
Versions
Issue