afrozchakure / Aadhar-OCR

This is a repository for Aadhar OCR 💳
GNU General Public License v3.0
6 stars 2 forks source link

Already available solutions for document verification #2

Open afrozchakure opened 4 years ago

afrozchakure commented 4 years ago

Possible Solutions to Document OCR with links:

1. Adhaar Card OCR using Tesseract Library
Link to repo: Here
tesseract1

2. Pan Card OCR using Tesseract Library
Link to repo: Here
tesseract2

3. Tesseract OCR Pyimagesearch
Link to article: Here

4. OpenCv OCR + Tesseract with LSTM
Link to article: Here

5. Using Computer Vision API services (for higher accuracy and if the system will have an internet connection):

6. Tesseract + OpenCV + Google API
Link to repo: Here tesseract3

OCR has two major building blocks:

  1. Text Detection - Detecting the required text is a tough task but thanks to deep learning, we’ll be able to selectively read text from an image.
  2. Text Recognition - Recognizing and returning the text in the image.

Text Detection 2 approaches:

  1. Region-Based Detectors
    • The first objective is to find all the regions which have the objects and then pass those regions to a classifier, which gives us the locations of the required objects. So, it is a two-step process.
    • Algorithms like Faster R-CNN and R-FCN take this approach.
    • This approach is considered more accurate but is comparatively slow as compared to the Single Shot approach.
  2. Single Shot Detectors
    • Single Shot detectors, however, predict both the boundary box and the class at the same time. Being a single step process, it is much faster.
    • Single Shot detectors perform badly while detecting smaller objects.
    • SSD and YOLO are Single Shot detectors.

Important approaches:

  1. Yolov3 Architecture
  2. Faster RCNN Architecture

Common Frameworks being used:

  1. Tesseract
  2. OpenCV
  3. Numpy
  4. Tensorflow