Originally reported by Neskie Manuel <neskiem@gmail.com> to Catalin Francu on
Sat, Sep 19, 2009 at 12:49 AM
Patch created by zdenop. Original message:
I downloaded your tesseractTrainer.py script to help with training
tesseract for Secwepemctsín. It works really well and has helped out
a quite a bit. I've added a 'Save To PDF' function that uses the
python-reportlab library.
Right now I convert the tif to a jpg and then load that and save it.
I don't know the code to the load the tif and convert it to a jpg. I
got this decoder 3 group error in part of the libjpg file. So to
create a PDF you now need.
TIFF, BOX, and a JPG. and then you can output a PDF with the text in
place. It's offset in height, and they aren't actually words. I can
work on some word placement. Could Pango be used for the word
merging? I'm sure there's more algorithms.
A bit more automation to work with tesseract and OmniPage lookout.
I've attached the updated tesseractTrainer.py. More work and it could
be more than just a trainer.
-Neskie Manuel
Original issue reported on code.google.com by zde...@gmail.com on 7 Aug 2010 at 11:06
Original issue reported on code.google.com by
zde...@gmail.com
on 7 Aug 2010 at 11:06Attachments: