GabrielDornelles / pytorch-ocr

Simple Pytorch framework to train OCRs. Supports CRNNs, Attention, CTC and Cross Entropy Loss.
MIT License
70 stars 16 forks source link

Attention mechanism #3

Closed gyr66 closed 1 year ago

gyr66 commented 1 year ago

Hi! I am very interested in this project and I have learned a lot from it. When I browse the code, I am confused by this line x = hiddens * attention . Should it be x = attention instead? I tried on a captcha dataset and find that when using x = hiddens * attention, the accuracy is 1% after 20 epochs. When using x = attention, the accuracy is about 77% after 20 epochs.

Thank you!

GabrielDornelles commented 1 year ago

Hi @gyr66 ! Happy to know you found the repository useful.

About the attention mechanism, its tricky, but let's get it one step at a time.

This Attention layer is something that changes the weights of the linear layer in such a way that it pays attention of specific things in the sequence (GRU output). That's why Attention is always applied into some context. In this repository I applied multiplicative attention, but additive attention is also something quite common to see.

So in summary:

gyr66 commented 1 year ago

Thanks for your detailed explanation! I am not familiar with multiplicative attention, that is probably why I am confused. Thanks a lot again!