Closed linhuifj closed 5 years ago
Hello,
In other fields: yes, it would be unnecessary. However, in the Scene Text Recognition (STR) field, (strangely) it would be necessary to reproduce the previous works. You can also train without ignore_index=0, but we recommend using it. Because we use [GO] token as a begin-of-sentence token and also use [GO] token as a [PAD] token to reduce the number of character sets.
When you read the previous papers which used the attention module in the STR field, you will see these sentences. "For the decoder, we use a GRU cell that has 256 memory blocks and 37 output units (26 letters, 10 digits, and 1 EOS token)" (from RARE)
Even though I felt something strange.. (I thought we also need [GO], [PAD], [UNK] or something more), we decided to follow the paper: only use 1 additional token [EOS], and make the output units for character prediction 37 (= 10 numbers + 26 alphabets + [EOS]). To do so, in our paper experiments, we took a similar way with https://github.com/marvis/ocr_attention which just concatenated target labels without padding.
And in this repository, for better readability, we used [GO] token as [PAD] token, and then ignoring index [GO] token = ignore index [PAD] token. = In our code, the output units for character prediction is 38 (= 37 + [GO] token) and ignoring [GO] token.
Of course, when you do not need to reproduce the previous works, we recommend using other tokens as well as [EOS] token, such as [PAD], [GO], [EOS], [UNK]... and then ignore_index = [PAD] token index.
Best.
Thank you. I get it.
I think it's unnecessary to ignore index because the [GO] token is removed before calculating the loss