GoldWaterFall / Chat-Health

chat health - empowering health management through clinical AI
Apache License 2.0
3 stars 1 forks source link

OCR engine Tesseract not work for other language #10

Open YuSun7543503 opened 1 year ago

YuSun7543503 commented 1 year ago

During experiment, tesseract-ocr shows excellent performance on English and number identification, however it doesn't perfoms well on Chinese identification, maybe use it for presentation is a good idea, but if we want to apply it to Chinese market, we need to explore other methods for OCR implementation.

YuSun7543503 commented 1 year ago

I have tried to search for some possible methods for OCR Identification, these links may be useful:

  1. OCR based on Tencent Cloud: https://cloud.tencent.com/product/ocr;
  2. OCR based on Baidu AI Platform: https://ai.baidu.com/tech/ocr.
YuSun7543503 commented 1 year ago

This might be useful as well: OCR based on Xunfei Cloud: https://www.xfyun.cn/services/common-ocr.

linxyu1 commented 1 year ago

hello,猜你一定是中文为母语的开发者,请问你后面是用了百度之类api调用作中文的识别吗?没在用tesseract训练数据吗?

YuSun7543503 commented 1 year ago

hello,是没有的,我们是做一个demo形式的展示,只用到了英文的识别,目前没有后续的计划。