Bartzi / see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
GNU General Public License v3.0
574 stars 147 forks source link

Understanding char_map #32

Closed cpmgs closed 6 years ago

cpmgs commented 6 years ago

Thanks for the software.

I was wondering if you can explain (in simple terms) or point to a resource that explains how to interpret the char_map? Looking at the ctc_char_map.json in small_dataset/ (if you download the text recognizer model) it makes little sense to me what is going on. For example, 10 is mapped to 57 but what does 10 and 57 represent and where is that defined?

Bartzi commented 6 years ago

You have to think of the char map as a dictionary for a language. The network predicts one of n classes, this class has a certain semantic and in order to easily map each class to its semantic meaning, we use the char_map.

It is called char_map because each class is mapped to a character. Because it is not nice to work with strings in Numpy we map each predicted class to the unicode represenation of a character. That means that the number 57 represents the charater 9 (because chr(57) == '9'. And thus if the network predicts class 10 it says that it found a 9 in the image.

So for each possible output of the network we have a corresponding character, that is encoded using its unicode representation.

cpmgs commented 6 years ago

That was just the link I needed. Thank you.