Bartzi / kiss

Code for the paper "KISS: Keeping it Simple for Scene Text Recognition"
GNU General Public License v3.0
110 stars 29 forks source link

Change the num_words_per_image without training again #13

Open vamsiadari95 opened 4 years ago

vamsiadari95 commented 4 years ago

Can we predict multiple words from a single image by changing the num_words_per_image? I tried changing in recognizer_class in evaluate.py file but facing this error.

InvalidType: 
Invalid operation is performed in: Reshape (Forward)

Expect: prod(x.shape) % known_size(=3072) == 0
Actual: 1536 != 0

Also, Can I know why spaces are not there in char_map? ( this may solve to predict multiple words in image)

Bartzi commented 4 years ago

No, you can not predict the content of multiple words per image without retraining. The code could be used to do exactly that, but the text recognition model provided by us does things a little different than you might think.

It is configured to predict the content of one word with a maximum of 23 characters. But it actually does it the other way round. We predict the locations of 23 words (each with one character) and then we assume that each word actually belongs to only one word (this is the conceptual level!). We can then put our one word with 23 characters into the transformer and predict the textual content.

You can, however, predict x words with max 23 characters, but you'll need to retrain the model for this, since the current model is not made for something like that.

We are not using spaces, since there are no words in the benchmark datasets that include spaces. You could add a space character to the char_map, but you'll need to retrain the model with enough data that also contains spaces. I'm sorry but this is one of the flaws of deep learning :/