Open peixin-lin opened 2 weeks ago
Hi @peixin-lin - thanks for reporting. We'll take a look as soon as we're able.
@christinestraub - This would be a good one to look at once you free up.
Hi @peixin-lin - thanks for reporting. We'll take a look as soon as we're able.
@christinestraub - This would be a good one to look at once you free up.
I found out that by setting the environment variable DEFAULT_PADDLE_LANG
to "ch" works at the moment.
I specified the
languages
parameter with the value["chi", "eng"]
but it did not work. When I upload a Chinese pdf document, Unstructured still loads a English model. I checkout the source code and found these lines in the pathunstructured\partition\utils\ocr_models\paddle_ocr.py
, where the init function receives no argument for specifying language:Is there a way to work around this?