AiPacino / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
2 stars 0 forks source link

Tesseract unable to process box/tif file (FAILURE! Couldn't find a matching blob) #1206

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Attempt to train against the attached tif/box file

What is the expected output? What do you see instead?
Expecting all characters to train, instead, I see many repeated errors.  For 
example:
APPLY_BOXES: boxfile line 34/0 ((1908,1420),(1927,1460)): FAILURE! Couldn't 
find a matching blob
APPLY_BOXES: boxfile line 689/I ((1037,650),(1047,692)): FAILURE! Couldn't find 
a matching blob

What version of the product are you using? On what operating system?
Ubuntu 12.04.  Tesseract 3.03.  I've used Tesseract 3.02.02 as well withs 
imilar results.

Please provide any additional information below.

I wrote the program to create these box/tif files myself.  I understand that 
the training process is very finicky, inputs have to be *exactly* right for it 
to work.  I'm mainly looking for guidance about how I can modify my program to 
generate a tif/box file that Tesseract would accept.  It's not clear to me what 
is wrong with this tif/box.  I've tried expanding the padding between 
characters already and it doesn't make a difference.

Original issue reported on code.google.com by matth...@gmail.com on 22 May 2014 at 7:04

Attachments:

GoogleCodeExporter commented 9 years ago
Please follow TrainingTesseract3[1] and use tesseract tools (e.g. text2image) 
instead of yours...

[1] 
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Trainin
g_Images

Original comment by zde...@gmail.com on 24 May 2014 at 2:02

GoogleCodeExporter commented 9 years ago
Trust me, I've been through that guide many, many times.  I've written a bunch 
of scripts that automate the complicated process of training.  It doesn't 
mention anything related to this error message that I'm asking about.

Using text2image won't work for me, since true type fonts are not accurate 
enough.  I'm using real image captures from the actual source images which 
gives me better accuracy.

I'm just looking for some guidance.  What am I doing wrong with the attached 
tif files?

Original comment by matth...@gmail.com on 27 May 2014 at 3:02