Closed GoogleCodeExporter closed 9 years ago
As per tesseract.log
utf-8 string too long at line 699= How to locate line 699 in the box file or
image file
APPLY_BOXES: Unlabelled word blk:1 row:12 allrows:12 = how to locate blk:1/row:
12/allrows:12 in the boxfile or image file? (using irfanview and paintbrush)
valuable guidance is requested
Original comment by withbles...@gmail.com
on 6 Jul 2009 at 7:15
I have updated the wiki to cover the new limit, which is 24 bytes.
Your box file contains the following (hex unicodes) at line 699:
ca4 ccd ca4 ccd caf ca8 ccd caf ccb (x 3 bytes = 27 total)
Is this really a single syllable? It doesn't look right to me as there is no
virama
between the caf and ca8, so it looks like 2 syllables.
You can go to line 699 by opening the file in VC++ and typing ctrl-g folloowed
by the
line number.
Original comment by theraysm...@gmail.com
on 6 Jul 2009 at 5:02
box file opened in VC++ and using CTrl+g then typed 699 - it pointed to
"ಸ್ಯಾನ್ಯಃ 1024
538 1114 576" With help of http://rishida.net/scripts/uniview/conversion.php the
following particulars of unicodes for ಸ್ಯಾನ್ಯಃ. for "ca4 ccd
ca4 ccd caf ca8 ccd caf
ccb"{ತ್ತ್ಯನ್ಯೋ 909 536 1009 576}particulars of unicodes noted
below
-----------------
"ಸ್ಯಾನ್ಯಃ 1024 538 1114 576" [ಸ್ಯಾ ನ್ಯಃ]
0CB8 ಸ KANNADA LETTER SA
0CCD ್ KANNADA SIGN VIRAMA
0CAF ಯ KANNADA LETTER YA
0CBE ಾ KANNADA VOWEL SIGN AA
0CA8 ನ KANNADA LETTER NA
0CCD ್ KANNADA SIGN VIRAMA
0CAF ಯ KANNADA LETTER YA
0C83 ಃ KANNADA SIGN VISARGA
0020 SPACE
---------------------------------------------------
"ca4 ccd ca4 ccd caf ca8 ccd caf ccb"{ತ್ತ್ಯನ್ಯೋ 909 536 1009
576}[ತ್ತ್ಯ ನ್ಯೋ]
ತ U+0CA4: KANNADA LETTER TA (Kannada)
್ U+0CCD: KANNADA SIGN VIRAMA (Kannada)
ತ U+0CA4: KANNADA LETTER TA (Kannada)
್ U+0CCD: KANNADA SIGN VIRAMA (Kannada)
ಯ U+0CAF: KANNADA LETTER YA (Kannada)
-------
ನ U+0CA8: KANNADA LETTER NA (Kannada)
್ U+0CCD: KANNADA SIGN VIRAMA (Kannada)
ಯ U+0CAF: KANNADA LETTER YA (Kannada)
ೋ U+0CCB: KANNADA VOWEL SIGN OO (Kannada)
Original comment by withbles...@gmail.com
on 6 Jul 2009 at 6:30
Original issue reported on code.google.com by
withbles...@gmail.com
on 6 Jul 2009 at 6:57Attachments: