Closed GoogleCodeExporter closed 9 years ago
I get this problem too.
All the time.
Itʻs ridiculous.
Original comment by g...@folkplanet.com
on 25 Apr 2012 at 3:46
@galt@folkplanet.com: ridiculous is to complain without providing
examples/tests.
There could be different reasons why tesseract is complaining.
Original comment by zde...@gmail.com
on 25 Apr 2012 at 6:58
hi
in continuation of the above problem, where Tesseract skips reading some text,
please find enclosed the following files: tam.TAMKambanWide.exp00.png and
tam.TAMKambanWide.exp00.box.orig and tam.TAMKambanWide.exp00.box
as the name suggests, the font is a "wide" font; when the box file is created
(the orig box file: tam.TAMKambanWide.exp00.box.orig) the boxes start from the
middle (infact from the + sign) and not from the left & topmost of the file;
(See below: should start from ா 239 3285 258 3302 0)
+ 613 3237 631 3249 0
அ 721 3223 737 3261 0
க்ஷு 827 3223 843 3262 0
/ 881 3228 913 3263 0
^ 1253 3232 1281 3255 0
* 1266 3285 1277 3309 0
ு 1355 3232 1384 3255 0
ம 1354 3285 1376 3309 0
` 1418 3232 1431 3256 0
@ 1413 3285 1459 3317 0
- 1475 3232 1478 3234 0
0 1465 3237 1487 3256 0
ா 239 3285 258 3302 0
the box file is edited and the revised file (tam.TAMKambanWide.exp00.box) is
used for training. but segmentation error follows as given below:
==
C:\indicocr\tesseract301>tesseract test.png test -l tam batch.nochop makebox
Tesseract Open Source OCR Engine v3.01 with Leptonica
C:\indicocr\tesseract301>tesseract tam.TAMKambanWide.exp00.png
tam.TAMKambanWide.exp00 nobatch box.train
Tesseract Open Source OCR Engine v3.01 with Leptonica
APPLY_BOXES: boxfile line 16/! ((1314,3284),(1318,3309)): FAILURE! Couldn't
find a matching blob
APPLY_BOXES: boxfile line 39/[ ((674,3224),(684,3262)): FAILURE! Couldn't find
a matching blob
APPLY_BOXES: boxfile line 41/] ((774,3224),(784,3262)): FAILURE! Couldn't find
a matching blob
APPLY_BOXES: boxfile line 44/| ((966,3224),(970,3262)): FAILURE! Couldn't find
a matching blob
APPLY_BOXES: boxfile line 46/: ((1063,3232),(1068,3249)): FAILURE! Couldn't
find a matching blob
APPLY_BOXES: boxfile line 47/' ((1107,3251),(1113,3263)): FAILURE! Couldn't
find a matching blob
APPLY_BOXES: boxfile line 48/" ((1154,3251),(1174,3263)): FAILURE! Couldn't
find a matching blob
APPLY_BOXES: boxfile line 51/. ((1316,3232),(1321,3235)): FAILURE! Couldn't
find a matching blob
APPLY_BOXES:
Boxes read from boxfile: 1683
Boxes failed resegmentation: 8
Found 1675 good blobs and 0 unlabelled blobs in 0 words.
0 remaining unlabelled words deleted.
TRAINING ... Font name = TAMKambanWide
Generated training data for 804 words
==
any solution or comments?
regards
rnkantan
Original comment by rnkan...@gmail.com
on 2 May 2012 at 11:54
Attachments:
hello,
I have the same problem.
I want to improve the recognition speed for OCR B font, just for digits and <>
characters.
I use QT Box v1.08 for the bounding boxes, it seem to me that QT Box recognizes
the characters ( or blobs ) but tesseract misses some. In my example i have
1100 character on the page and tesseract only find 900.
I attached my files and a print screen about the issue, any help would be
appreciated.
Original comment by kaszin...@gmail.com
on 24 Oct 2012 at 10:03
Attachments:
@kaszinova: Your image do not follow criteria mentioned on training wiki.
Because of that you got error messages.
3.02.02 version recognize 1080 of 1100 characters ;-)
If you visualize error messages you could see problem (red boxes
errors_in_ocr.normal.exp0.png).
If I make your characters order more realistic (see ocr.normal.exp1.png &
ocr.normal.exp1.box) tesseract 3.02 will produce no errors.
Original comment by zde...@gmail.com
on 4 Jan 2013 at 11:18
Attachments:
@rnkantan: Can you please try 3.02 (or better current svn code)? I tried:
tesseract tam.TAMKambanWide.exp00.png tam.TAMKambanWide.exp00 nobatch box.train
and it worked:
Tesseract Open Source OCR Engine v3.02.02 with Leptonica
row xheight=24, but median xheight = 17.631
row xheight=24, but median xheight = 17.631
row xheight=26, but median xheight = 17.631
row xheight=26, but median xheight = 17.631
row xheight=26, but median xheight = 17.631
row xheight=26, but median xheight = 17.631
row xheight=26, but median xheight = 17.631
row xheight=26, but median xheight = 17.631
APPLY_BOXES:
Boxes read from boxfile: 1683
Found 1683 good blobs.
TRAINING ... Font name = TAMKambanWide
Generated training data for 669 words
It tried:
tesseract tam.TAB_Kamban_Italic.exp00.tif tam.TAB_Kamban_Italic.exp00 nobatch box.train
and I got:
Tesseract Open Source OCR Engine v3.02.02 with Leptonica
row xheight=24, but median xheight = 17.5323
row xheight=26, but median xheight = 17.5323
row xheight=26, but median xheight = 17.5323
row xheight=24, but median xheight = 17.5323
APPLY_BOXES: boxfile line 1543/ே ((2149,1949),(2178,1980)): FAILURE! Couldn't
find a matching blob
APPLY_BOXES: boxfile line 3552/-! ((367,277),(376,301)): FAILURE! Couldn't find
a matching blob
APPLY_BOXES:
Boxes read from boxfile: 3553
Boxes failed resegmentation: 2
APPLY_BOXES: Unlabelled word at :Bounding box=(239,3113)->(396,3153)
APPLY_BOXES: Unlabelled word at :Bounding box=(937,3116)->(1066,3151)
APPLY_BOXES: Unlabelled word at :Bounding box=(1229,3114)->(1360,3151)
APPLY_BOXES: Unlabelled word at :Bounding box=(1380,3114)->(1518,3151)
APPLY_BOXES: Unlabelled word at :Bounding box=(1545,3116)->(1688,3151)
APPLY_BOXES: Unlabelled word at :Bounding box=(239,2661)->(396,2701)
APPLY_BOXES: Unlabelled word at :Bounding box=(937,2664)->(1066,2699)
APPLY_BOXES: Unlabelled word at :Bounding box=(1229,2662)->(1360,2699)
APPLY_BOXES: Unlabelled word at :Bounding box=(1380,2662)->(1518,2699)
APPLY_BOXES: Unlabelled word at :Bounding box=(1545,2664)->(1688,2699)
APPLY_BOXES: Unlabelled word at :Bounding box=(607,2120)->(702,2156)
APPLY_BOXES: Unlabelled word at :Bounding box=(1443,2120)->(1538,2156)
APPLY_BOXES: Unlabelled word at :Bounding box=(444,864)->(505,890)
APPLY_BOXES: Unlabelled word at :Bounding box=(576,493)->(707,530)
APPLY_BOXES: Unlabelled word at :Bounding box=(2094,493)->(2209,530)
APPLY_BOXES: Unlabelled word at :Bounding box=(611,311)->(678,349)
APPLY_BOXES: Unlabelled word at :Bounding box=(1969,311)->(2035,349)
Found 3551 good blobs.
Leaving 6 unlabelled blobs in 0 words.
17 remaining unlabelled words deleted.
TRAINING ... Font name = TAB_Kamban_Italic
Generated training data for 674 words
And I think that these errors are correct (e.g. you need to fix box file)
Original comment by zde...@gmail.com
on 4 Jan 2013 at 11:40
Original comment by zde...@gmail.com
on 4 Feb 2013 at 10:03
Original issue reported on code.google.com by
rnkan...@gmail.com
on 23 Apr 2012 at 10:34Attachments: