6. Tesseract + OpenCV + Google API
Link to repo: Here
OCR has two major building blocks:
Text Detection - Detecting the required text is a tough task but thanks to deep learning, we’ll be able to selectively read text from an image.
Text Recognition - Recognizing and returning the text in the image.
Text Detection 2 approaches:
Region-Based Detectors
The first objective is to find all the regions which have the objects and then pass those regions to a classifier, which gives us the locations of the required objects. So, it is a two-step process.
Algorithms like Faster R-CNN and R-FCN take this approach.
This approach is considered more accurate but is comparatively slow as compared to the Single Shot approach.
Single Shot Detectors
Single Shot detectors, however, predict both the boundary box and the class at the same time. Being a single step process, it is much faster.
Single Shot detectors perform badly while detecting smaller objects.
Possible Solutions to Document OCR with links:
1. Adhaar Card OCR using Tesseract Library
![tesseract1](https://user-images.githubusercontent.com/40469121/83360353-51c13c00-a39e-11ea-9ce9-172b2265fc12.png)
Link to repo: Here
2. Pan Card OCR using Tesseract Library
![tesseract2](https://user-images.githubusercontent.com/40469121/83360357-5980e080-a39e-11ea-8c22-ffb9f44505d3.png)
Link to repo: Here
3. Tesseract OCR Pyimagesearch
Link to article: Here
4. OpenCv OCR + Tesseract with LSTM
Link to article: Here
5. Using Computer Vision API services (for higher accuracy and if the system will have an internet connection):
6. Tesseract + OpenCV + Google API![tesseract3](https://user-images.githubusercontent.com/40469121/83360363-63a2df00-a39e-11ea-893b-442cca0fa343.png)
Link to repo: Here
OCR has two major building blocks:
Text Detection 2 approaches:
Important approaches:
Common Frameworks being used: