Bartzi / stn-ocr

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition
https://arxiv.org/abs/1707.08831
GNU General Public License v3.0
498 stars 139 forks source link

i read the paper and have a question: what is the order of the labels? #19

Closed jacobunderlinebenseal closed 6 years ago

jacobunderlinebenseal commented 6 years ago

assume there are N lines in a image, (the order is "aaa", "bbb", "ccc"...) each have a bbox, after the LocalizationNetwork there N affine transformation matrices (maybe the order is "ccc", "bbb", "aaa"), but how to decide which is which? if don't align it, how to train it? or if it just have a prescriptive order of.. like from top to bottom? and what will happen if the number of bbox in the image is less or more than N?

Bartzi commented 6 years ago

Yes, the labels need to be ordered. From left to right and top to bottom, you could also go the other way round. Just make sure the labelling is consistent otherwise it won't work with the current code... the label ordering forces an order constrain on the predictions of the localisation network

having a possibility to align labels to predictions and not forcing any constraints on the order would be cool, but I had no idea how to do this until now

If the number is less than N then you will need to pad the label-vector with blank label words and the network will learn to predict these blank labels. If there are more bboxes the network will hopefully predict the first N regions

jacobunderlinebenseal commented 6 years ago

thanks a lot for answer me so clearly and so soon, best wish