NJACKWinterOfCode / Printed-Text-recognition-and-conversion

A software for extracting text from scanned images of printed text documents
MIT License
4 stars 12 forks source link
cnn contours-detection image-processing machine-learning opencv-python pyqt4 python3 segmentation sklearn

Printed-Text-recognition-and-conversion

forthebadge forthebadge

Introduction

These days there is a huge demand in storing the information available in paper documents into a computer, storage disk and then later reusing this information by searching process. One simple way to store information from these paper documents in to computer system is to first scan the documents and then store them as images. But to reuse this information it is very difficult to read the individual contents and searching the contents form these documents line-by-line and word-by-word.This poses an inconvenience because the image is not searchable or editable. Even when we want to convert scanned images directly into pdf, they are not in editable or searchable format.

The aim of this project was to make a software which would be capable of identifying and recognizing English typed text from an image(.jpg, .jpeg, .png) and convert it to an editable format(.txt ,etc) so that it can be directly modified without the need for typing the text document again manually. The project involves the implementation of Image Processing techniques and Machine Learning Algorithms.

Approach:

To install

The language used is Python3

Required libraries

      Numpy
      OpenCV
      Sklearn
      Scikit
      Tensorflow
      PyQt4

To run through GUI

      python gui.py      

To run on CLI

      python main.py     

Authors

Roshni Ram

Ishita Das

Rohit Shamdasani

Ayush Mudgal