For this project, I wanted to build an application that could read books to children. In order to achieve this, I designed a workflow which performs the following steps.
My dataset was made from hundreds of photos of my kids' books as well as a number of library books taken in various lighting conditions, orientations, and distances. I used labelImg to annotate my dataset with bounding boxes so I could train the model to identify Text Blocks on a page.
The Model was trained using MXNet using a VGG 16 model as a base. The steps used for training are outlined in this notebook
This project is built using GreenGrass, Python, MXNet, OpenCV, Tesseract, and AWS Polly.
In order to get the Text Area cleaned up to perform OCR, the program needs perform pre-processing on the image using a number of filters in OpenCV. This graphic shows an example of the steps that ReadToMe goes through with each image before trying to turn the image into text.
The lambda consists of three main files.
Because the user has no way to tell the DeepLens when a book is in front of the camera, we use the model to detect blocks of text on the page. When we find a text block, we isolate the image using the getRoi() function inside of imageProcessing.py.
Another important step that is performed is correctSkew() in imageProcessing.py. This warps/rotates the text block to try to make the text horizontal. If the text is angled or skewed, there will be problems when trying to do OCR.
Finally we remove any non utf-8 characters after doinc ocr. RemoveNonUtf8BadChars() in imageProcessing.py. This step just attempts to clean up the text before turning the text to speach.
To run this project on the deeplens, you will need to first install a few packages from a terminal window on the DeepLens.
sudo apt-get update && sudo apt-get install tesseract-ocr && apt-get install python-gi
Also, in order to get the audio to work on DeepLens, I had to perform the following steps:
Instructions for creating a DeepLens project can be found in the online documentation
The model files for this project are located here: https://github.com/alexschultz/ReadToMe/tree/master/mxnet-model
You will need to tar up the files and put them in S3 when you create the project for the DeepLens. See the official AWS documentation for detailed instructions
The Lambda function code is contained inside the lambda directory in this repo.
In order to build the lambda function, you will need to install the project dependencies and package up the lambda to be deployed to AWS. This needs to be done from a linux machine as the dependencies are OS specific.
To install the pip packages, cd into the lambda directory and run the following command:
pip install -r requirements.txt
Once you have the pip packages installed, you need to bundle up the files into a distributable so it can be uploaded to the Lambda console.
I have included a helper python script to package up the lambda in the format that AWS requires See the documentation referenced above for more details on how to create a custom DeepLens project.
If you have any questions or find any issues with this project, please open an issue, Thanks!