deajan / pmOCR

A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR conversion on file activity
BSD 3-Clause "New" or "Revised" License
64 stars 15 forks source link

Tesseract 4 support #12

Closed bwakkie closed 5 years ago

bwakkie commented 5 years ago

Would it be much effort to add support for tesseract 4? (https://github.com/tesseract-ocr/tesseract)?

deajan commented 5 years ago

This shouldn't require rocket science to get working, but I definitly need to setup a test machine with Tesseract 4 first, because my work environnment (CentOS 7) ships with Tesseract 3.

deajan commented 5 years ago

Seems that tesseract 3 and 4 both use the same syntax. I've fired up a test machine today, and could use tesseract 4 without further problems. Renamed "tesseract3" to "tesseract" to remove ambiguous config name.

Would you like to have a test run with current git before I release a new version ?

bwakkie commented 5 years ago

Please release :) I will check back at a later date when I have to batch process my 1000+ pdfs

deajan commented 5 years ago

I'll release shortly. In the meantime, you may already use git master for tesseract 4.X.

deajan commented 5 years ago

Released.