bgshih / crnn

Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.
MIT License
2.05k stars 549 forks source link

Pretrained model training set contains IIIT5k images too? #33

Open NightFury13 opened 7 years ago

NightFury13 commented 7 years ago

I ran the demo codes using the pretrained model and I seem to be getting around 86% word-level accuracy on the IIIT5k dataset whereas the paper suggests ~80% accuracy. Were there any other pre/post processing steps involved in training the pretrained-model provided?

bgshih commented 7 years ago

@NightFury13 The released model should deliver close results to the ones reported in the paper. Our best result did not reach 86% on IIIT5k. We did not include IIIT5k training data either.

rremani commented 7 years ago

Hi @bgshih ,

I read your paper so there are some points I wasn't clear on.

  1. While training do I have to use cropped words or I can use the natural scene images where text is available to train these images. I have this doubt because I tested the demo model given here on natural scene images which wasn't able to recognise multiple words or even a single word.

  2. My research area lies in identifying non- dictionary words from raw images taken from mobile vans and use it to update open street maps, so mostly I will be encountering names of shops and interesting places. To tackle this problem, I thought of using py-faster-rcnn to first identify the words from the image and then train the cropped words using your model. But reading through your paper I felt there isn't any need of object detection 'words' and your model can actually be trained on natural images.

  3. Finally, does image captioning a similar kind of problem, only difference being it doesn't contain and text. So, overall sense I got is that features of the word are learnt which helps in distinguishing between individual letters so helps in identifying even the lexicon free words.

It would be helpful to know, if my interpretation about your model is correct and a bit of guidance would be appreciated.

Thanks

bgshih commented 7 years ago

@rremani 1.2. CRNN is only for cropped words. For whole images with much more background, a text detection method is required to detect text first. If you feed whole images directly into CRNN, the text would be buried by background noise.

  1. I think it is possible to employ an image captioning model to this problem. But I am concerned with the order you set for the output words -- For image captioning problem the words are naturally ordered by grammar. But for text detection you need to define that order meaningfully.
rremani commented 7 years ago

@bgshih How about including a Region Proposal Network inside CRNN that can make it end to end trainable ? Is training such a model harder ? Tried both separately which had nice results as expected. Thanks for your valuable suggestions.

bgshih commented 7 years ago

@rremani Sorry but I am not sure if that will work -- worth a try, I think.

rremani commented 7 years ago

@bgshih Thanks a lot for your suggestions!

rayush7 commented 7 years ago

@rremani I am working on the problem of OCR in the wild (text detection + recognition on natural images). As @bgshih suggested that we need a text detection module on top of CRNN, which would extract the text segments and that would be the input to CRNN. Could you please explain how did you use Region Proposal Network inside CRNN to make it end to end trainable? Please Suggest.

rremani commented 7 years ago

Hi @rayush7, I haven't tried an end to end model yet, used SSD for extraction and CRNN for recognition.

bgshih commented 7 years ago

FYI, we have recently released another project for text detection https://github.com/MhLiao/TextBoxes.

rayush7 commented 7 years ago

@bgshih @rremani Thanks guys. Will surely look into Textboxes.

ahmedmazari-dhatim commented 7 years ago

Hi @bgshih

does your TextBoxes https://github.com/MhLiao/TextBoxes work on invoices ?

such as : image_0032

Thank you