gnewtothis101 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Reading accented characters in a language without accents #1352

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.Attempt to read Russian text with accented vowels
2.Results do not come in correctly. 
3.The character ó is read as б , for example.

What is the expected output? What do you see instead?
Would like to receive the correct output for all accented Russian vowels.

What version of the product are you using? On what operating system?
3.0.2

Please provide any additional information below.
Accents are not typically used in written Russian, but they are often used to 
show those learning the language where the stress falls on a word. There are no 
rules in Russian as to where the stress lies so it is impossible to know 
without memorizing.  

Original issue reported on code.google.com by spm...@gmail.com on 23 Oct 2014 at 9:19

GoogleCodeExporter commented 9 years ago
Can you please provide example image that demonstrate problem?

Original comment by zde...@gmail.com on 2 May 2015 at 8:20

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Attached is a page from the "Orthoepic dictionary of the Russian language" 
edited by Avanesov.

Original comment by stchisti...@gmail.com on 4 May 2015 at 6:03

Attachments:

GoogleCodeExporter commented 9 years ago
I'm moving this to a github issue: 
https://github.com/tesseract-ocr/langdata/issues/8
Now that we have the langdata repository, it makes sense to me to have issues 
for requesting language packs. This one in particular interests me; I'll take a 
look into it when I get done with migrating issues.

Original comment by joregan on 13 May 2015 at 9:15