Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.37k stars 572 forks source link

Compatibility Issue with Chinese Text in Document Parsing #3267

Open JIAQIA opened 1 week ago

JIAQIA commented 1 week ago

FYI: https://github.com/Unstructured-IO/unstructured/pull/3096

Thanks for your review.

JIAQIA commented 1 week ago

Sorry for the delayed response. I've recently had some changes at work and have just formed a team dedicated to LLM-related development. This has caused some delays, and I couldn't address the previous issues in time.

I just finished testing on my Mac, and the test results are consistent with the main branch (possibly due to some OCR packages or other issues, even the main branch can't fully pass the tests on my Mac).

So, please review it again.

@MthwRobinson FYI