bes-dev / crnn-pytorch

Pytorch implementation of OCR system using CRNN + CTCLoss
BSD 2-Clause "Simplified" License
217 stars 55 forks source link

longer text #9

Open cbasavaraj opened 6 years ago

cbasavaraj commented 6 years ago

Hi, Thanks for the repo. It's very well coded and easy to use with a custom dataset. I first tried on a custom dataset where the average text length is 7 letters. This works quite well. Now using a more complicated dataset with average text length of 18 characters, where space can be one of the characters (so multiple words instead of single words). Still training (I think the GRU takes time), but so far the results have not been that good. With both models, the loss goes down quite smoothly, but the average edit distance jumps around quite a lot. For the first model, it improved when I used Adam (your default) instead of AdaDelta which I was playing with. For the second, Adam's not really doing the trick. If you've worked a lot with this model and have some ideas, please let me know. Thanks.

cbasavaraj commented 6 years ago

For anyone else interested in this, it helps to have a bigger projection layer. I changed the default of 10x20 to 10x50, immediately got a jump. Plus using ResNet50 instead of ResNet18 also helps.

bes-dev commented 6 years ago

Hi, I think that you should find an information bottlenecks in network. I didn't test this network with long text sequences.