emedvedev / attention-ocr

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.
MIT License
1.07k stars 257 forks source link

ValueError: GraphDef cannot be larger than 2GB #53

Closed feygina closed 6 years ago

feygina commented 6 years ago

I'm training on my own dataset containing rather long text lines. I'm wondering if it can be the reason for this error. Probably anyone can know how to fix it? Thanks!

emedvedev commented 6 years ago

Long lines are, indeed, the problem: when you set a high max-prediction value for longer text, it significantly increases the number of seq2seq encoder/decoder nodes, so the graph size bloats. The title error aside, it might result in pretty bad performance; this model isn't optimized for something like that, unfortunately.

You could try extracting individual words from your image (or at least splitting the lines into smaller chunks) through segmentation. OpenCV would be a good library to look at if words on your images have well-defined boundaries. Reading Text in the Wild is a good example of text spotting when your images are a bit more complex.

Finally, if you're just looking for a good OCR library for scanned text (just a wild guess), Tesseract would work much better than this model, I think.

Good luck!

feygina commented 6 years ago

Thank you for your response and advice. I tried Tesseract before, and it gave me only 55% accuracy (I have low resolution images), but Google Vision API OCR 87%. So now I want to train something that can perform like Google or even better.

emedvedev commented 6 years ago

You can certainly try this model then, but you'll still have to split your strings into shorter segments.

Or maybe even add an object detection layer to the model, although that'd be a lot of work, and I'm not sure about its results for OCR. Let me know how it goes though, I'm curious now!

kamalkarki commented 5 years ago

@feygina I am also struck here to improve the accuracy as I am working on the scanned documents for the text retrieval. I have also tried " tesseract " and some other tools. Can you suggest anything in which direction to think/explore?