githubharald / CTCWordBeamSearch

Connectionist Temporal Classification (CTC) decoder with dictionary and language model.
https://towardsdatascience.com/b051d28f3d2e
MIT License
557 stars 160 forks source link

question about feedMat #27

Closed hxk11111 closed 5 years ago

hxk11111 commented 5 years ago

Hi @githubharald , thanks for you project. I have some question about the mat fed into the tf session. I am training crnn+ctc model. For example, for an image which represents for text "x181208022". Before ctc layer, I have the rnn output, if I use greedy decoding, I will get the result as "--x-11-8-1-2-0-8--0-2-2---", "-" represents for the ctc-blank. If I want to use your project, should I just feed the rnn output matrix into word beam search part? Because I saw your testing code:

blank = len(chars)
s = ''
batch = 0
for label in res[batch]:
    if label == blank:
        break
    s += chars[label] 

The for loop will break if met a ctc-blank. But in my case, ctc-blank is not the end of a word, if break it will give the wrong result

githubharald commented 5 years ago

Hi,

the blank is only used to indicate the end of the resulting string of the CTC decoder (if it is shorter than the output of the RNN layers). So, it would e.g. return "Hello-----", where only the string before the first blank is relevant.

P.S.: the output of your greedy decoder should not contain blanks between characters. Seems that it only applies step (1) of greedy decoding: computing the list of characters with highest score along the x-axis of the image (more details see "Best path decoding" in this article).

hxk11111 commented 5 years ago

Many Thanks