Nrp8247 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

APPLY_BOXES: FAILURE! Couldn't find a matching blob #954

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run Tesseract training on attached box/tif pair
2.
3.

What is the expected output? What do you see instead?
I expect all lines of characters to be accepted as valid boxes for the blobs. 
Some lines are not being correctly treated as such during training and also at 
recognition time.

Though all lines are at the same fontsize and same linespacing, tesseract gives 
the following errors/warnings. Please note that devanagari script has 
'maatraas' above and below a character.

row xheight=8.66667, but median xheight = 24.75
row xheight=31, but median xheight = 24.75
row xheight=31, but median xheight = 24.75
row xheight=31, but median xheight = 24.75
row xheight=31, but median xheight = 24.75
row xheight=31, but median xheight = 24.75
row xheight=31, but median xheight = 24.75
row xheight=32, but median xheight = 24.75
row xheight=31, but median xheight = 24.75
row xheight=32, but median xheight = 24.75
row xheight=31, but median xheight = 24.75
row xheight=31.4602, but median xheight = 24.75
row xheight=31.4602, but median xheight = 24.75
row xheight=31.4602, but median xheight = 24.75
row xheight=31.4602, but median xheight = 24.75
row xheight=31.4602, but median xheight = 24.75

What version of the product are you using? On what operating system?
tesseract 3.02 on windows7

Please provide any additional information below.
below is the batch file with training commands used:

@echo off
setlocal enabledelayedexpansion
set TESSDATA_PREFIX=D:\BuildFolder\testing
D:\BuildFolder\testing\tesseract -v
::
echo **** creating .tr files ***** 

for /f "delims=|" %%F in ('dir *.tif /b') do (
    echo processing %%~nxF
    D:\BuildFolder\testing\tesseract %%~nxF %%~nF  -l hin nobatch box.train logfile
    copy tesseract.log %%~nF.log.txt
    del %%~nF.txt
)
echo **** done creating .tr files ***** 

Original issue reported on code.google.com by shreeshrii on 17 Jul 2013 at 7:29

Attachments:

GoogleCodeExporter commented 9 years ago
Are there config variables to specify the median height to avoid these errors?

Is there a variable which can be set to false to avoid looking at line-height? 

Thanks!

Original comment by shreeshrii on 17 Jul 2013 at 8:16

GoogleCodeExporter commented 9 years ago
Getting same errro with oriya also

tesseract, ori.freeserif.exp2.tif, ori.freeserif.exp2, box.train]
Tesseract Open Source OCR Engine v3.02.03 with Leptonica
row xheight=23.5, but median xheight = 16.8
row xheight=23.5, but median xheight = 16.8
row xheight=23.5, but median xheight = 16.8
APPLY_BOXES: boxfile line 257/ନ୍ତି ((336,2826),(390,2864)): FAILURE! 
Couldn't find a matching blob
APPLY_BOXES: boxfile line 258/ନି ((386,2836),(408,2864)): FAILURE! Couldn't 
find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:     422
   Boxes failed resegmentation:       2
   Found 420 good blobs.
   Leaving 9 unlabelled blobs in 0 words.

Original comment by shreeshrii on 12 Nov 2013 at 1:37

GoogleCodeExporter commented 9 years ago
i have also same problem how to solve it

Original comment by stylishm...@gmail.com on 19 Jun 2014 at 5:20

GoogleCodeExporter commented 9 years ago
Information about median height is not error. It is just information. If you 
want to avoid it than fix your input image, so xheight of each row is the same.

Original comment by zde...@gmail.com on 22 Jun 2014 at 8:27

GoogleCodeExporter commented 9 years ago
Also I found out that tesserract has (training) problem if the (non 
chines/japan) input image has vertical spaces.

Original comment by zde...@gmail.com on 22 Jun 2014 at 8:39

GoogleCodeExporter commented 9 years ago
Hi I also get this problem when training!

Test blob assigned to row at (-193,0) on pass 0
Test blob y=(-193,0), row=(-241.250000,-48.250000), overlap=144.750000
Test blob assigned to row at (-241.25,-23.5) on pass 4
Test blob y=(-193,0), row=(-253.625000,-11.125000), overlap=181.875000
Test blob assigned to row at (-253.625,-11.125) on pass 1
FAIL!
APPLY_BOXES: boxfile line 0/1 ((91,136),(108,181)): FAILURE! Couldn't find a 
matching blob
FAIL!
APPLY_BOXES: boxfile line 1/2 ((161,136),(187,181)): FAILURE! Couldn't find a 
matching blob
FAIL!
APPLY_BOXES: boxfile line 2/5 ((562,138),(589,183)): FAILURE! Couldn't find a 
matching blob
FAIL!
APPLY_BOXES: boxfile line 3/3 ((28,73),(55,119)): FAILURE! Couldn't find a 
matching blob
FAIL!
APPLY_BOXES: boxfile line 4/6 ((118,72),(145,118)): FAILURE! Couldn't find a 
matching blob
FAIL!
APPLY_BOXES: boxfile line 5/4 ((310,74),(338,119)): FAILURE! Couldn't find a 
matching blob
FAIL!
APPLY_BOXES: boxfile line 6/5 ((385,74),(412,119)): FAILURE! Couldn't find a 
matching blob
FAIL!
APPLY_BOXES: boxfile line 7/7 ((27,11),(55,56)): FAILURE! Couldn't find a 
matching blob
FAIL!
APPLY_BOXES: boxfile line 8/9 ((490,12),(518,59)): FAILURE! Couldn't find a 
matching blob
FAIL!
APPLY_BOXES: boxfile line 9/0 ((265,10),(292,57)): FAILURE! Couldn't find a 
matching blob
FAIL!
APPLY_BOXES: boxfile line 10/8 ((603,75),(632,124)): FAILURE! Couldn't find a 
matching blob
APPLY_BOXES:
   Boxes read from boxfile:      11
   Boxes failed resegmentation:      11
APPLY_BOXES: Unlabelled word at :Bounding box=(-669,-193)->(0,0)
   Found 0 good blobs.
   1 remaining unlabelled words deleted.
Generated training data for 0 words

Is there a solution for this yet?

Mac OS x 10.9.4
tesseract 3.02.02

Original comment by mossberg...@gmail.com on 30 Jul 2014 at 8:21

Attachments:

GoogleCodeExporter commented 9 years ago
Same problem trying to train for Church Slavonic.

Original comment by aleksandr.andreev@gmail.com on 14 Aug 2014 at 3:31