JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://www.jaided.ai
Apache License 2.0
24.41k stars 3.16k forks source link

Latvian language incorrect #98

Closed viktors1982 closed 4 years ago

viktors1982 commented 4 years ago

Hey,

Latvian language is incorrect. Does not recognize long marks (macron).

import easyocr
reader = easyocr.Reader(['lv'])
txt = reader.readtext('avatar.jpg')
print(txt)

avatar

Output:

[([[42, 12], [463, 12], [463, 62], [42, 62]], 'Gada ir tikai divas dienas,', 0.057680848985910416), ([[53, 57], [449, 57], [449, 101], [53, 101]], 'kuras mes neko nevaram', 0.41570523381233215), ([[39, 101], [143, 101], [143, 143], [39, 143]], 'iesakt.', 0.9080015420913696), ([[155, 101], [463, 101], [463, 143], [155, 143]], 'Viena ir vakardiena', 0.32598042488098145), ([[185, 143], [469, 143], [469, 187], [185, 187]], 'ritdiena. Tadejadi', 0.2768159508705139), ([[35, 149], [153, 149], [153, 185], [35, 185]], 'un otra', 0.5262686610221863), ([[66, 184], [434, 184], [434, 232], [66, 232]], 'šodien ir ista diena, lai', 0.17492368817329407), ([[63, 231], [442, 231], [442, 272], [63, 272]], 'miletu, ticetu, daritu un', 0.17923244833946228), ([[68, 266], [267, 266], [267, 320], [68, 320]], 'galvenokart', 0.8032127618789673), ([[295, 275], [431, 275], [431, 313], [295, 313]], 'dzivotu!', 0.7159291505813599)]

rkcosmos commented 4 years ago

That's because 1. my list of Latvian characters doesn't have characters with long mark. (see this file ) You can help me by send a PR with updated characters list. 2. Even with updated character list, Latin model currently doesn't have those characters. So you will have to wait for next version of model to be able to recognize that.

viktors1982 commented 4 years ago

Thanks for your work. In the attachment I added correct Latvian characters

lv_char.txt

rkcosmos commented 4 years ago

Is there anything between P and R (also p and r)? They are blank space in my text editor.

viktors1982 commented 4 years ago

there is nothing between them.

image

visutida commented 4 years ago

Updated by #112