KaniyamFoundation / ProjectIdeas

A Place to write down the project ideas and to plan them
40 stars 3 forks source link

Make Potten-OCR to work with Tamil #37

Closed tshrinivasan closed 4 years ago

tshrinivasan commented 5 years ago

https://github.com/harish2704/pottan-ocr

Online demo - https://harish2704.github.io/pottan-demo/

This is malayalam ocr based on machine learning backed by pytorch. Explore this and make it to work with Tamil

tshrinivasan commented 5 years ago

Here is the document on how to train for Tamil https://github.com/harish2704/pottan-ocr/wiki/How-to-train-pottan-for-another-language

nithyadurai87 commented 5 years ago

I have installed it locally. The local version for malayalam is working fine.

But the training instructions for Tamil is not clear.

Will check with the team.

tshrinivasan commented 5 years ago

note from harish

it is now possible to run pottan-ocr training using Keras . added Google cloab notebook in the repository so that, any one can instantaneously start a training on Google colab Please check https://colab.research.google.com/github/harish2704/pottan-ocr/blob/keras-training/misc/pottan_ocr.ipynb

The model used in keras version of the ocr is very light weight . ( I removed one conv layer and decreased imageHeight from 32 to 16 ). But any one can change that before starting their own training session

wickkiey commented 5 years ago

Hi may I know which data is used for Tamil OCR purpose ..

tshrinivasan commented 4 years ago

As tesseract is matured, the potten ocr is development is stopped.

Hence, closing this.