adithya-s-k / omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
https://docs.cognitivelab.in
GNU General Public License v3.0
4.37k stars 350 forks source link

中文docx解析出现大量乱码 #50

Open mxm-web-develop opened 2 weeks ago

mxm-web-develop commented 2 weeks ago

pdf正常,.doxc出现乱码,现在文件稍微大点docker就自己挂掉了,是我的电脑配置问题吗?

gzgogogz commented 2 weeks ago

中文doc乱码是作者所用的OCR不支持中文,改进中。