Language Issue: Chinese

WilliamLo commented 8 years ago

I have tried to get the latest languages(chi_tra & chi_sim) from tessdata(https://github.com/tesseract-ocr/tessdata).

Also updated the init code to: *G8RecognitionOperation operation = [[G8RecognitionOperation alloc] initWithLanguage:@"eng+chi_tra+chi_sim"]; But the application crashes and i got below error: read_params_file: parameter not found: allow_blob_division**

Then i got the languages data from langdata(https://github.com/tesseract-ocr/langdata), commented "allow_blob_division F" in chi_tra.config and tried to compile. but it said "Fontconfig error: Cannot load default config file Could not find font named AR_PL_UKai_TW"

My script: sudo ./tesstrain.sh --lang chi_tra --fontlist 'AR_PL_UKai_TW' --fonts_dir /Users/williamlo/Library/Fonts --langdata_dir /Users/williamlo/Documents/langdata --tessdata_dir /Users/williamlo/Documents/tessdata --output_dir /Users/williamlo/Documents/langdata2

Anyone know what is the issue or how can i get the proper language files?

ws233 commented 8 years ago

@WilliamLo, have you tried to run the original Tesseract with your trained data and an image file? Pls, try it first just to confirm, that you've trained the Tesseract correctly. Thanks!

mobyIsMe commented 8 years ago

I have met the same issue, have u figured it out? @WilliamLo

WilliamLo commented 8 years ago

@ws233 I don't know why i can't compile the trained data, at the end i use the language files from previous version. @sunwind2010 You can find the language files on source forge. (https://sourceforge.net/projects/tesseract-ocr-alt/files/)

mobyIsMe commented 8 years ago

I have downloaded the chi_sim.traineddata, and add it to the tessdata folder , the I modified this line: G8RecognitionOperation *operation = [[G8RecognitionOperation alloc] initWithLanguage:@"eng+chi_sim"];

then I run the app and got crash

mobyIsMe commented 8 years ago

this is the crash information： 2016-06-15 14:28:27.519 Template Framework Project[2918:868549] Snapshotting a view that has not been rendered results in an empty snapshot. Ensure your view has been rendered at least once before snapshotting or snapshot after screen updates. Printing description of language->isa: __NSCFConstantString read_params_file: parameter not found: allow_blob_division

mobyIsMe commented 8 years ago

@WilliamLo thanks a lot! I will try again:)

mobyIsMe commented 8 years ago

@WilliamLo I tried this file "tessract-ocr-3.0.2.chi_sim.tar.gz" downloaded from the link, and added into the tessdata folder, but the recognition process took a long time and the accuracy is unbelievable, for instance: I just wrote two characters "天天"，but the result is unreadable which is not even a character.

mobyIsMe commented 8 years ago

@WilliamLo like this: 一一入 ~ 一.~~瓤~一一一.._一一一一一一一一__一一一一〇一一一一一一一一_一一一一_一一一_一一一_一一一一一一一.一

… ..MWWMW

, r … 灬r v …. ~而麒、F′~则一一-.一,_一一，.

v. . .. A.me

ws233 commented 8 years ago

@sunwind2010, @WilliamLo Tesseract-OCR-iOS is just an iOS wrapper above the Tesseract OCR engine. Pls, mention all you questions there. They will definitely help you. I'm closing the issue, since it's not related to the iOS wrapper.

mobyIsMe commented 8 years ago

Sorry to bother u , thanks a lot!

13409795771 commented 7 years ago

Can it recognize Chinese?

gali8 / Tesseract-OCR-iOS

Language Issue: Chinese #256