StevensSEC / monocle

A mobile app to transcribe images of printed text from a page.
GNU General Public License v3.0
3 stars 0 forks source link

Create TensorFlow model(s) for text detection/recognition #9

Closed dcarpenter31 closed 3 years ago

dcarpenter31 commented 3 years ago

A TensorFlow model for identifying text in photos will need to be created to be utilized by the app.

rmonaghanjr commented 3 years ago

TensorFlow has an existing OCR model that is documented on this page. It does mention that it isn't the best model for OCR in the wild/in low light but maybe we can use this as a springboard for a different model.

We might also want to look into the OpenCV text detection showcased here.

OpenCV’s EAST text detector is a deep learning model, based on a novel architecture and training pattern. It is capable of (1) running at near real-time at 13 FPS on 720p images and (2) obtains state-of-the-art text detection accuracy.

It seems as if this could be a good solution to this issue, however, more research should be done on this. This is a very complex topic after all.

dyc3 commented 3 years ago

I found that page too. There's a catch:

The models are not general enough for OCR in the wild (say, random images taken by a smartphone camera in a low lighting condition).

This might be usable if we gathered some training data and retrained it, but then we might have to label training data and thats tedious

rmonaghanjr commented 3 years ago

It does mention that it isn't the best model for OCR in the wild/in low light but maybe we can use this as a springboard for a different model.

I did see that too, but we would be faced with the same issue if we were to create our own model anyway. I was more intrigued by the OpenCV text detection because it can run in real-time. Maybe instead of Tensorflow, we can use that.

EDIT: This article (it is on medium so there might be a paywall) has some datasets that contain data for text recognition.

Devanagari Character Dataset KAIST Scene Text Database

dyc3 commented 3 years ago

I have a feeling that it's gonna be tough to get opencv into react native. I'm gonna go look for potential datasets, there's probably something freely available.

rmonaghanjr commented 3 years ago

I just edited my above comment with 2 datasets containing data (>1800 images) that may be able to be used.

dyc3 commented 3 years ago

Datasets that I found:

rmonaghanjr commented 3 years ago

That first one with Cornell looks very promising.

dyc3 commented 3 years ago

Yeah, I'm downloading the COCO dataset right now. mscoco.org seems to be down, but I found a mirror. Gonna look into hosting the dataset somewhere easier to access/more reliable.

Edit: Actually, looks like they moved their website to https://cocodataset.org/

rmonaghanjr commented 3 years ago

Some new resources: https://stackoverflow.com/questions/50344844/tensorflow-js-for-ocr https://github.com/tensorflow/models/tree/master/research/attention_ocr

Attention OCR looks customizable so we can use the COCO dataset that was referenced above.

dyc3 commented 3 years ago

Since we decided to abandon doing this on device, can we close this?

dcarpenter31 commented 3 years ago

Closed since we have moved ML processing to a remote server