bgshih / crnn

Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.
MIT License
2.06k stars 552 forks source link

How CTC deals with successive doubled characters ? #77

Closed ahmedmazari-dhatim closed 7 years ago

ahmedmazari-dhatim commented 7 years ago

Hello,

F( -c-a-t- ) = F(ccaa-t-) = · · · = cat

for this case it's ok. Know let's consider the following cases :

F( -a-ll- ) = F(-all ) = · · · = al  
F( -l-e-t-ee'-r-s ) = F(-le-t-eers ) = · · · = leters  

However here they are not duplicated character but all and letters both have successive double l . How CTC deals with that ?

Thank you ?

da03 commented 7 years ago

F(-a-l-l-)=all, I'd suggest reading the original CTC paper.

ahmedmazari-dhatim commented 7 years ago

@da03 , what if it's F(-a-ll-) ? then :

F(-a-ll-) = all or al ?

da03 commented 7 years ago

That's al. Intuitively two successive separate symbols should either be different or be separated by a blank. So F(ab)=ab, F(aa)=a, F(a-a)=aa.

ahmedmazari-dhatim commented 7 years ago

Hi @da03 ,

Thank for the clarification.

So it's strange because F(ab)= ab and F(aa)=a and not (aa) it's like you say for F(ab)= ab are adjacent characters where a b are two characters of different columns of the image but when it comes to F(aa) = a it means that the CTC sees the same column ( char overlapping ) ?

How do you explain the fact that `F(ab)= ab and F(aa) != aa and F(aa) = a and F(a-a)=aa' ?

It's like we say that two successive symbols should either be different or be separated by a blank. But we can have two successive symbols are the same such astt in letter,F(letter)=letter

da03 commented 7 years ago

haha that might appear strange, but actually each point in conv net's feature map corresponds to a smaller region than an actual character (and even the space between consecutive characters), so when we go from one point to the next in the feature map, it is possible that they come from the same underlying character. For example, assume that one character takes two points in the feature map while one space takes one point (i mean the space between consecutive characters, such as the space between a in aa), then for a sequence "aa", we will predict two occurrences of "a", then one occurrence of space, followed by two occurrences of "a" again. (This example is simplified, since we actually predict probabilities, and we do not need to force each point in feature map to be smaller than the space, since if we see right half a and left half a, we know that should be a space. But the intuition is there)