dynobo / normcap

OCR powered screen-capture tool to capture information instead of images
https://dynobo.github.io/normcap/
Other
1.92k stars 89 forks source link

Whether can recognize Chinese with chi_sim.traineddata #104

Closed yaojingzhe closed 3 years ago

yaojingzhe commented 3 years ago

Dear Author:

May I use normcap to recognize Chinese on screan with chi_sim.traineddata? I tried, but I couldn't recognize Chinese I saved Chinese screen to cn.jpg and use CMD: tesseract cn.jpg cn -l chi_sim I got cn.txt and recognized Chinese

Should I set normcap to recognize Chinese? no config file to set.

I downloaded normcap_win64.zip

Thank you very much for your reply.

System:windows 10

dynobo commented 3 years ago

Hi @yaojingzhe,

Yes, in general you can use NormCap will all language available for tesseract. But the compiled binary-releases currently only include german and english support (because the file size of the language data is quite large).

If you answer the following questions, I might be able to help you setting up the support for chinese:

  1. What's your operating system? MacOS? Linux? Windows 10?
  2. Did you install normcap using the pip install normcap method and installed tesseract independently?
  3. Or did you use one of the provided normcap binary releases? If yes, which version?

PS: Now that I know that some users are interested in Chinese, I'm going to include chi-sim in the next release, too, but this might take some time... :-)

yaojingzhe commented 3 years ago

Dear author:

I appreciate your prompt reply.

My system is windows10 x64bit

The tesseract is tesseract-ocr-w64-setup-v5.0.0-alpha.20201127.exe from https://github.com/UB-Mannheim/tesseract/wiki. It’s work well to recognize Chinese

Yes, I used the portable normcap binary releases. Version is 0.1.10. The file name is normcap_win64.zip.

After failed to recognize Chinese. I tried to download and run normcap_win64_installer.exe v0.1.10. and also tried the way that delated my python 3.8.8 and downloaded and installed python 3.7.5 x64 and then pip install normcap following your installation C (package installation).

Failed to recognize Chinese in above 3 way. I think it's time I should sought your help.

Thanks again.

dynobo commented 3 years ago

Hi @yaojingzhe,

You have two options:

1. Extend your portable version normcap_win64.zip with chinese The binary releases use the included tesseract 4 version. By default it currently include only english and german, but you can easily download the chi_sim language data file and drop it in the folder /normcap/tessdata/ of the portable version. Then you can start normcap with the -l or --lang parameter to set the language, e.g. normcap -l chi_sim for chinese or normcap -l chi_sim+eng for chinese and english support.

2. Use tesseract installed on system with normcap's pip package If you install normcap through pip install normcap, it will use the systems tesseract with all it's installed language data. So as you have a working tesseract installation, you should be able to run with chinese support using the language argument normcap -l chi_sim.

Caveat: On windows, normcap is only tested with tesseract 4.1, I do not know if the 5alpha will work. You can grab the installer from 4.1 here: tesseract-ocr-w32-setup-v4.1.0.20190314.exe along with the language data. But if you test run with 5alpha first, I certainly would be intererested to know, if it works :-)

Please let me know, if this helped you!

yaojingzhe commented 3 years ago

Dear author:

I downloaded normcap_win64.zip and use normcap with -l chi_sim, it's work well and can recognize both Chinese and English. I will soon try install normcop trouth pip install normcap to test tesseract 5alpha and tell you the result.

Your program is so excellent than help me greatful.

And also thank you for your kindness.