Change the num_words_per_image without training again

No, you can not predict the content of multiple words per image without retraining. The code could be used to do exactly that, but the text recognition model provided by us does things a little different than you might think.

It is configured to predict the content of one word with a maximum of 23 characters. But it actually does it the other way round. We predict the locations of 23 words (each with one character) and then we assume that each word actually belongs to only one word (this is the conceptual level!). We can then put our one word with 23 characters into the transformer and predict the textual content.

You can, however, predict x words with max 23 characters, but you'll need to retrain the model for this, since the current model is not made for something like that.

We are not using spaces, since there are no words in the benchmark datasets that include spaces. You could add a space character to the char_map, but you'll need to retrain the model with enough data that also contains spaces. I'm sorry but this is one of the flaws of deep learning :/

Bartzi / kiss

Change the num_words_per_image without training again #13