githubharald / CTCWordBeamSearch

Connectionist Temporal Classification (CTC) decoder with dictionary and language model.
https://towardsdatascience.com/b051d28f3d2e
MIT License
557 stars 160 forks source link

What does 'check length of chars and wordChars: 0<len(wordChars)<=len(chars)' mean? #47

Closed hasansalimkanmaz closed 4 years ago

hasansalimkanmaz commented 4 years ago

If you create a new issue, please provide the following information:

  1. Which program causes the problem

    • Custom TF operation
  2. Versions

    • TensorFlow version: Version: 1.15.0
    • Python version: 3.7
    • C++ compiler: gcc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010
    • Operating system: Linux
  3. Issue I couldn't understand this error message in your code. I got this error during evalution of 1st epoch in the training.

    
    Validate NN
    terminate called after throwing an instance of 'std::invalid_argument'
    what():  check length of chars and wordChars: 0<len(wordChars)<=len(chars)```

My task is to recognize handwritten dates, such as "19/10/1993" it only contains slashes and numbers. Does it result from this?

How can I solve the issue?

githubharald commented 4 years ago

Hi,

can you show what you pass as chars and wordChars to the algorithm?

Just as a reminder:

In your case (just guessing) this could look like this:

So, wordChars is a subset of chars and therefore, there can't be more characters in wordChars than chars, therefore 0<len(wordChars)<=len(chars) must hold.

One more thing: if possible, don't use word beam search while training, as it is slower than best path decoding. I would train using best path decoding, and if best path decoding gives good results, than word beam search should also give good results.

hasansalimkanmaz commented 4 years ago

Thanks for crystal clear explanation.

It worked and improved my accuracy.