A small framework taking over the manual training process described in the Tesseract3 Wiki: https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
I am running Tesseract 3.02 and trying to train with a new font. Having problems doing that, I downloaded and attempted to train Tesseract with the stardard TIF / BOX pairs. I did convert the .G4. versions of the TIF to uncompressed. Each subset of fonts I have attempted gets this error...
C:\Tesseract-OCR\tessdata>..\shapeclustering -F font_properties -U unicharset -O unicharset eng.arial.tr eng.arialbd.tr eng.arialbi.tr eng.ariali.tr
Reading eng.arial.tr ...
Reading eng.arialbd.tr ...
Reading eng.arialbi.tr ...
Reading eng.ariali.tr ...
Font id = -1/0, class id = 1/108 on sample 0
font_id >= 0 && font_id < font_idmap.SparseSize():Error:Assert failed:in file ....\classify\trainingsampleset.cpp, line 622
I am running Tesseract 3.02 and trying to train with a new font. Having problems doing that, I downloaded and attempted to train Tesseract with the stardard TIF / BOX pairs. I did convert the .G4. versions of the TIF to uncompressed. Each subset of fonts I have attempted gets this error...
C:\Tesseract-OCR\tessdata>..\shapeclustering -F font_properties -U unicharset -O unicharset eng.arial.tr eng.arialbd.tr eng.arialbi.tr eng.ariali.tr Reading eng.arial.tr ... Reading eng.arialbd.tr ... Reading eng.arialbi.tr ... Reading eng.ariali.tr ... Font id = -1/0, class id = 1/108 on sample 0 font_id >= 0 && font_id < font_idmap.SparseSize():Error:Assert failed:in file ....\classify\trainingsampleset.cpp, line 622
Any assistance please.