BackupGGCode / pytesseracttrainer

Visual tesseract box file editor
GNU General Public License v3.0
1 stars 1 forks source link

Feature: patch for 'Save To PDF' #2

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Originally reported by Neskie Manuel <neskiem@gmail.com> to Catalin Francu on 
Sat, Sep 19, 2009 at 12:49 AM

Patch created by zdenop. Original message:

I downloaded your tesseractTrainer.py script to help with training
tesseract for Secwepemctsín.  It works really well and has helped out
a quite a bit. I've added a 'Save To PDF' function that uses the
python-reportlab library.

Right now I convert the tif to a jpg and then load that and save it.
I don't know the code to the load the tif and convert it to a jpg. I
got this decoder 3 group error in part of the libjpg file.  So to
create a PDF you now need.

TIFF, BOX, and a JPG. and then you can output a PDF with the text in
place.  It's offset in height, and they aren't actually words.  I can
work on some word placement.  Could Pango be used for the word
merging?  I'm sure there's more algorithms.

A bit more automation to work with tesseract and OmniPage lookout.
I've attached the updated tesseractTrainer.py.  More work and it could
be more than just a trainer.

-Neskie Manuel

Original issue reported on code.google.com by zde...@gmail.com on 7 Aug 2010 at 11:06

Attachments: