JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://www.jaided.ai
Apache License 2.0
23k stars 3.03k forks source link

How to add support CJK Symbols and Punctuation #1175

Open leotu opened 7 months ago

leotu commented 7 months ago

These characters are often used "「」,。、"

Example:

2023-12-04 10 39 31 2023-12-04_09 25 15

https://zh.wikipedia.org/wiki/Unicode#%E4%B8%AD%E6%96%87%E8%BC%B8%E5%85%A5%E6%B3%95

But I had no idea how to retraining my model easily after reading below link:

https://github.com/JaidedAI/EasyOCR/blob/master/custom_model.md https://github.com/JaidedAI/EasyOCR/tree/master/trainer

Is it possible to download the source code and append some characters to these files and run some scripts to enhance the new model?

https://github.com/JaidedAI/EasyOCR/blob/master/easyocr/character/ch_sim_char.txt https://github.com/JaidedAI/EasyOCR/blob/master/easyocr/character/ch_tra_char.txt

leotu commented 7 months ago

OCR sample result ("「」,。、" all disappear)

2023-12-04 10 50 54