maybe the code itself support training with text length > 26

Holmeyoung / crnn-pytorch

Pytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR.

MIT License

378 stars 105 forks source link

maybe the code itself support training with text length > 26 #50

Open ghost opened 4 years ago

ghost commented 4 years ago

@Holmeyoung in https://github.com/Holmeyoung/crnn-pytorch/issues/17 you mentioned that your codes only support training with text length <= 26, I found that (1) when resize the images to 100X32. length of the raw character output is 26. so we cannot train with text length > 26.

(2) when keep_ratio = True, only the height of the image is resized to 32, the width of the image is not fixed and vary for different images. so length of the raw character output is not fixed and depends on the width of the image, maybe we can train with any text length

conclusion: we can train with any text length when we set keep_ratio = True during training

Thank you so much.

Holmeyoung commented 4 years ago

Hi, in fact, it only depends on the output lstm T length.

ghost commented 4 years ago

thanks for your reply. if we don't change the network, will the output lstm T length only depend on the width of the image?

ghost commented 4 years ago

I got the answer from your reply from https://github.com/Holmeyoung/crnn-pytorch/issues/17

You need to calculate it. After conv and pool what's the image width. The image width will be T length in rnn.

then output width of the CRNN().cnn will be the T length? and text length should not exceed T? is what I said here right? thank you so much.

        self.cnn = cnn
        self.rnn = nn.Sequential(
            BidirectionalLSTM(512, nh, nh),
            BidirectionalLSTM(nh, nh, nclass))

Holmeyoung commented 4 years ago

I got the answer from your reply from #17

You need to calculate it. After conv and pool what's the image width. The image width will be T length in rnn.

then output width of the CRNN().cnn will be the T length? and text length should not exceed T? is what I said here right? thank you so much.
        self.cnn = cnn
        self.rnn = nn.Sequential(
            BidirectionalLSTM(512, nh, nh),
            BidirectionalLSTM(nh, nh, nclass))

Yeah

ghost commented 4 years ago

Got it, thank you so much.

ghost commented 4 years ago

@Holmeyoung hi, the output width of the CRNN().cnn is T, and text length should not exceed T. my question is: if the text length is larger than T, then there will be errors? or we can still train the model?

thank you so much.