Open iszhi opened 1 year ago
I'm not against it at all, but it is for me not really doable, since I have zero knowledge of Mandarin. The NLP processors don't support it afaik, but tesseract (the tool doing the OCR) has support for chinese traditional and simplified, don't know if that would help?
For date recognition I would need a PR or at the very least all the info from here
I'm not against it at all, but it is for me not really doable, since I have zero knowledge of Mandarin. The NLP processors don't support it afaik, but tesseract (the tool doing the OCR) has support for chinese traditional and simplified, don't know if that would help? @eikek Since NLP don't support Mandarin, can you add it via tesseract? (PS. I don't know either NLP and tesseract exactly.)
I think tesseract has support for simplified and traditional chinese - which one is better? It is possible to add it to the docker image and add a language option to the ui.
In China, simplified Chinese is used in mainland China, and traditional Chinese is used in Taiwan and Hong Kong. Simplified Chinese means more user base. But if possible, I recommend installing two languages.
I'm not against it at all, but it is for me not really doable, since I have zero knowledge of Mandarin. The NLP processors don't support it afaik, but tesseract (the tool doing the OCR) has support for chinese traditional and simplified, don't know if that would help?
For date recognition I would need a PR or at the very least all the info from here
Stanford CoreNLP support (mainland) Chinese.
Stanford CoreNLP [backup download page] An integrated suite of natural language processing tools for English, Spanish, and (mainland) Chinese in Java, including tokenization, part-of-speech tagging, named entity recognition, parsing, and coreference
I also have a lot of Documents written by Mandarin. Can you add this too?