Bartzi / see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
GNU General Public License v3.0
573 stars 147 forks source link

Question about training. #60

Closed billqxg closed 5 years ago

billqxg commented 5 years ago

The question is how to align the extracted text regions and the ground-truth. As locations of text is unavailable, how to assign ground-truth label to one extracted text region?

Bartzi commented 5 years ago

For the alignment, we make the following asumption: We assume that the groundtruth for the extual content of the words in the given image is ordered from top to bottom and left to right. With this assumption we can train the network, as it implicitly learns to first focus on the top left word and then on all the other words according to the assumption.

This is by far not ideal, but it works, for now.

billqxg commented 5 years ago

Thanks a lot for your reply!