HarshUpadhyay / TesseractTrainer

A small framework taking over the manual training process described in the Tesseract3 Wiki: https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
Other
130 stars 37 forks source link

Support for python 3.x using pillow #7

Closed shantanoo closed 11 years ago

shantanoo commented 11 years ago

Have verified that same modules are available in pillow, but was not able to verify due to unavailability of tesseract setup.Please test before pulling the code.

brouberol commented 11 years ago

I suggest changing the requirements in the setup.py from PIL to Pillow, as the latter is compatible with python 2.x and 3.x. This way, no need to introspect the version number, and more importantly, no need to perform a version check at install, to fetch the "good" Imaging library (PIL for 2.x and Pillow for 3.x). I've tested it on my laptop with python 2.7 and 3.2, and Pillow seems to do the job.

I've pushed to master a few commits to adapt the syntax, import mechanism and string/unicode handling and make it python2 & 3 compatible. You just need to pull my master branch into yours before pushing, so I can merge it.

The PR should/could also remove the "Python 2 only" in the README.md. Feel free to do it, and also to create a CONTRIBUTORS.txt file at the root of the repo, and add your name in it.

Thanks again for bringing Python3 support for this project!

-- Notes:

I then encounter other issues, but related to tesseract 3.02 behavior. I could not, for the life of me, install tesseract 3.01 on a new linux box. The only help I could find to my error was to install the 3.02, and tesseracttrainer is still incompatible with it, even though someone is working on it right now.

Important: when installing Pillow, you need to have the following libraries installed (as for PIL): libjpeg62 libjpeg-dev libfreetype6 libfreetype6-dev zlib1g-dev, otherwise, some freetype/zlib/jpeg operations will not be supported.