Closed yaojingzhe closed 3 years ago
Hi @yaojingzhe,
Yes, in general you can use NormCap will all language available for tesseract. But the compiled binary-releases currently only include german and english support (because the file size of the language data is quite large).
If you answer the following questions, I might be able to help you setting up the support for chinese:
pip install normcap
method and installed tesseract independently?PS: Now that I know that some users are interested in Chinese, I'm going to include chi-sim
in the next release, too, but this might take some time... :-)
Dear author:
I appreciate your prompt reply.
My system is windows10 x64bit
The tesseract is tesseract-ocr-w64-setup-v5.0.0-alpha.20201127.exe from https://github.com/UB-Mannheim/tesseract/wiki. It’s work well to recognize Chinese
Yes, I used the portable normcap binary releases. Version is 0.1.10. The file name is normcap_win64.zip.
After failed to recognize Chinese. I tried to download and run normcap_win64_installer.exe v0.1.10. and also tried the way that delated my python 3.8.8 and downloaded and installed python 3.7.5 x64 and then pip install normcap following your installation C (package installation).
Failed to recognize Chinese in above 3 way. I think it's time I should sought your help.
Thanks again.
Hi @yaojingzhe,
You have two options:
1. Extend your portable version normcap_win64.zip
with chinese
The binary releases use the included tesseract 4 version. By default it currently include only english and german,
but you can easily download the chi_sim language data file and drop it in the folder /normcap/tessdata/
of the portable version. Then you can start normcap with the -l
or --lang
parameter to set the language, e.g. normcap -l chi_sim
for chinese or normcap -l chi_sim+eng
for chinese and english support.
2. Use tesseract installed on system with normcap's pip package
If you install normcap through pip install normcap
, it will use the systems tesseract with all it's installed language data. So as you have a working tesseract installation, you should be able to run with chinese support using the language argument normcap -l chi_sim
.
Caveat: On windows, normcap is only tested with tesseract 4.1, I do not know if the 5alpha will work. You can grab the installer from 4.1 here: tesseract-ocr-w32-setup-v4.1.0.20190314.exe along with the language data. But if you test run with 5alpha first, I certainly would be intererested to know, if it works :-)
Please let me know, if this helped you!
Dear author:
I downloaded normcap_win64.zip and use normcap with -l chi_sim, it's work well and can recognize both Chinese and English. I will soon try install normcop trouth pip install normcap to test tesseract 5alpha and tell you the result.
Your program is so excellent than help me greatful.
And also thank you for your kindness.
Dear Author:
May I use normcap to recognize Chinese on screan with chi_sim.traineddata? I tried, but I couldn't recognize Chinese I saved Chinese screen to cn.jpg and use CMD: tesseract cn.jpg cn -l chi_sim I got cn.txt and recognized Chinese
Should I set normcap to recognize Chinese? no config file to set.
I downloaded normcap_win64.zip
Thank you very much for your reply.
System:windows 10