Question about performance of Attention + CE vs CRNN + CTC

GabrielDornelles / pytorch-ocr

Simple Pytorch framework to train OCRs. Supports CRNNs, Attention, CTC and Cross Entropy Loss.

MIT License

72 stars 16 forks source link

Question about performance of Attention + CE vs CRNN + CTC #10

Closed githubharald closed 10 months ago

githubharald commented 10 months ago

Hi, thanks for sharing. I'm wondering if you tested attention + CE loss vs CRNN + CTC loss, which one performs better on your data/in your experiments?

GabrielDornelles commented 10 months ago

Hi Harald!

Overall I found out that on short and fixed length text (say like car plates or captchas, which are 5-10 characters) CTC loss gets accuracy much faster and usually better, but when I say better its really just like the 1-2% if given enough time to train with CE. Not sure though about which one is more data hungry (or if both methods come close to equal), never done that experiment.

I've implemented the CTC Loss first in this repo, and later on I found out that most modern OCRs for real world text used CE, and then I implemented here but never really used it myself as any final model.

githubharald commented 10 months ago

All clear, thanks a lot of explaining :-)

githubharald commented 10 months ago

I had similar findings, I think CTC trains faster and maybe a bit more accurate mostly because it allows more freedom (the model has more freedom where in the sequence to output the character, and also more freedom by allowing outputting a character repeatedly) to the model outputs.

GabrielDornelles commented 10 months ago

Makes sense, its much more pleasant to see CTC outputs than CE, as you can kinda see its right, just has to apply the CTC logic (and even when CTC does not get the right answer it looks much closer to it than CE), almost like if you could manually fix it to be right without adding additional content.