OCR service with AWS Rekognition

jkerak commented 6 years ago

I was experimenting with an idea about doing OCR on snapshots of twitch streams of speed runners, and reporting statistics or perhaps even pushing notifications when a speed runner is close to a world record or personal best run. In my research, I came across rotisserie, which at a high level seemed to be very similar to what I was trying to do.

Rather than running and managing your own tesseract instance, have you thought about making use of some out of the box "OCR As a Service" solutions? I don't know how much it typically costs to run the OCR service currently, but I think it's quite possible that the cost could be reduced by utilizing something like AWS Rekognition: https://aws.amazon.com/rekognition/ (you can do text detection on the first 5k images per month free, 1$ per 1000 images after that - no cost of provisioning servers or containers)

I think it would be nice to never have to think about running tesseract or training models - and you could also have an API gateway route to a Lambda function to be responsible for calling AWS Rekognition.

Microsoft has a similar offering as well, which might also be a good solution: https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/

Any thoughts on an approach that avoids running or managing the actual OCR "stuff"?

nibalizer commented 6 years ago

Thanks for your interest @jkerak!

It's a pretty hype idea. We'd love to collaborate. It also could get really interesting to do this from inside twitch extensions, which I've not had time to kick the tires on yet.

The app itself only has one 'microservice' - the OCR service. We wanted to ship the app as soon as possible, which is why we went with tesseract. Because the OCR work is outside the app, refactoring it to use a different OCR service (either home grown or hosted, would be super easy). This is by design. We would absolutely accept a PR that made Rekognition or another service a configurable option.

There is also the potential to re-do the app as a sortof more basic service around capturing screenshots from twitch and pushing them onto e.g. a queue or s3 bucket or something. Then rotissierie would be a much simpler app that does something specific with those screenshots. Ideally, twitch would provide an api endpoint to capture a screenshot refreshed on some interval, but I don't want to wait around for that.

At current, we capture a screenshot every 15 seconds, so we're looking at 172800 OCR operations per month, per stream. So we'd be way over the Rekognition free tier, but maybe there are cheap alternatives we can explore.

Thanks again for you interest!

jkerak commented 6 years ago

Thanks for the reply. Totally makes sense on the choice of terreract given your initial goals. With 172800 operations, we're looking at about 172$ per month per stream with the Amazon or Microsoft offerings, which definitely seems high. I wonder if you can get around this by splicing together all the cropped images from all streams into a single image, and sending that one grid of elements as a single image to the service. This would bring the cost down to 172$ per month total, instead of per month per stream. Still a little pricey, but not quite as bad I think.

nibalizer commented 6 years ago

Hey @jkerak I wanted to circle back to this.

@rmoe trained two models to do the OCR. One specifically against pubg, the other against fortnite. These files are added to the OCR microservice as .pb files. We haven't committed them to the repo but we're figuring out the right way to share them - there has to be a better way than putting them in dropbox.

Anyways with the pubg model we saw an increase from 27% accuracy to 99% accuracy. Since we host them in k8s, it's free.

Pretty neat!

eggshell commented 6 years ago

@nibalizer we could just host it somewhere on the rotisserie website.

rmoe commented 6 years ago

They're hosted in our object storage service. This is where they're pulled from when the OCR container is built (https://github.com/IBM/rotisserie/blob/master/deploy/images/ocr.Dockerfile#L4-L5).

The PUBG model is here: https://s3-api.us-geo.objectstorage.softlayer.net/rotisserie-ml-models/model.pb

and Fortnite is here: https://s3-api.us-geo.objectstorage.softlayer.net/rotisserie-ml-models/fortnite.pb

IBM / rotisserie

OCR service with AWS Rekognition #99