Closed GoogleCodeExporter closed 9 years ago
1. Please can you try current svn (version 3.02)?
2. Can you please also attach example files for testing (zzz.ocra.exp0.tif
zzz.ocra.exp0.box)
Original comment by zde...@gmail.com
on 27 Jul 2012 at 6:43
Unfortunately in v3.02 the same issue exists (same code there, so same effect
:) ). I would like to send the image but I cannot due to work issues. I suspect
however if you used a few patch files as samples, you too could generate more
than 200 training words in a page.
BTW There's a typo in the patch I sent (at least if you care about backwards
compatibility to check your vectors) where line 16-18 should read
+ chop_index += '0' - 1;
+ else
+ chop_index += 'A' - 11;
It's interesting as the INVALID_UNICHAR_ID code (and comment) must have been
from before the assert().
Original comment by pddf...@gmail.com
on 27 Jul 2012 at 12:42
I can not reproduce it (openSUSE 12.1):
$ tesseract slk.cambria.exp001.tif slk.cambria.exp001 nobatch box.train
Tesseract Open Source OCR Engine v3.01 with Leptonica
Page 0
APPLY_BOXES: boxfile line 537/— ((189,2623),(237,2628)): FAILURE! Couldn't
find a matching blob
APPLY_BOXES:
Boxes read from boxfile: 2018
Boxes failed resegmentation: 1
Found 2017 good blobs and 0 unlabelled blobs in 0 words.
0 remaining unlabelled words deleted.
TRAINING ... Font name = cambria
Generated training data for 433 words
Original comment by zde...@gmail.com
on 27 Jul 2012 at 11:24
Attachments:
This issue was closed by revision r742.
Original comment by theraysm...@gmail.com
on 21 Sep 2012 at 3:19
Original issue reported on code.google.com by
pddf...@gmail.com
on 26 Jul 2012 at 10:50Attachments: