Closed githubharald closed 10 months ago
Hi Harald!
Overall I found out that on short and fixed length text (say like car plates or captchas, which are 5-10 characters) CTC loss gets accuracy much faster and usually better, but when I say better its really just like the 1-2% if given enough time to train with CE. Not sure though about which one is more data hungry (or if both methods come close to equal), never done that experiment.
I've implemented the CTC Loss first in this repo, and later on I found out that most modern OCRs for real world text used CE, and then I implemented here but never really used it myself as any final model.
All clear, thanks a lot of explaining :-)
I had similar findings, I think CTC trains faster and maybe a bit more accurate mostly because it allows more freedom (the model has more freedom where in the sequence to output the character, and also more freedom by allowing outputting a character repeatedly) to the model outputs.
Makes sense, its much more pleasant to see CTC outputs than CE, as you can kinda see its right, just has to apply the CTC logic (and even when CTC does not get the right answer it looks much closer to it than CE), almost like if you could manually fix it to be right without adding additional content.
Hi, thanks for sharing. I'm wondering if you tested attention + CE loss vs CRNN + CTC loss, which one performs better on your data/in your experiments?