Closed AdarshMJ closed 6 years ago
yes, it takes both the text and the image for each sample to train the neural network. The loss value (to train the neural network) is calculated by using the output of the neural network for a given image and the ground-truth text. I recommend reading about CTC, as this is where all the magic happens (and which also confuses people the most at the beginning).
Here are two articles to get you started with CTC:
Thank you so much!
Thank you so much for this awesome tutorial. I had one more clarification. The training code in the repository that is main.py, does it make use of the image-text pair both for training? Or just the words.txt? I know it might seem a silly doubt, I'm a complete beginner, so just wanted to clear this out.
Thanks you