clovaai / deep-text-recognition-benchmark

Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Apache License 2.0
3.77k stars 1.11k forks source link

bug in the code: do not need to ignore index for attention loss #83

Closed linhuifj closed 5 years ago

linhuifj commented 5 years ago
    criterion = torch.nn.CrossEntropyLoss(ignore_index=0).to(device)  # ignore [GO] token = ignore index 0

I think it's unnecessary to ignore index because the [GO] token is removed before calculating the loss

ku21fan commented 5 years ago

Hello,

In other fields: yes, it would be unnecessary. However, in the Scene Text Recognition (STR) field, (strangely) it would be necessary to reproduce the previous works. You can also train without ignore_index=0, but we recommend using it. Because we use [GO] token as a begin-of-sentence token and also use [GO] token as a [PAD] token to reduce the number of character sets.

When you read the previous papers which used the attention module in the STR field, you will see these sentences. "For the decoder, we use a GRU cell that has 256 memory blocks and 37 output units (26 letters, 10 digits, and 1 EOS token)" (from RARE)

Even though I felt something strange.. (I thought we also need [GO], [PAD], [UNK] or something more), we decided to follow the paper: only use 1 additional token [EOS], and make the output units for character prediction 37 (= 10 numbers + 26 alphabets + [EOS]). To do so, in our paper experiments, we took a similar way with https://github.com/marvis/ocr_attention which just concatenated target labels without padding.

And in this repository, for better readability, we used [GO] token as [PAD] token, and then ignoring index [GO] token = ignore index [PAD] token. = In our code, the output units for character prediction is 38 (= 37 + [GO] token) and ignoring [GO] token.

Of course, when you do not need to reproduce the previous works, we recommend using other tokens as well as [EOS] token, such as [PAD], [GO], [EOS], [UNK]... and then ignore_index = [PAD] token index.

Best.

linhuifj commented 5 years ago

Thank you. I get it.