Open ZephryLiang opened 2 weeks ago
error message in the source code :
def get_agent(cls) -> OCRAgent:
"""Get the configured OCRAgent instance.
The OCR package used by the agent is determined by the `OCR_AGENT` environment variable.
"""
ocr_agent_cls_qname = cls._get_ocr_agent_cls_qname()
try:
return cls.get_instance(ocr_agent_cls_qname)
except (ImportError, AttributeError):
raise ValueError(
f"Environment variable OCR_AGENT must be set to an existing OCR agent module,"
f" not {ocr_agent_cls_qname}."
)
what agent can i use? please!
Closing in favor of #3187. Looks like the same issue.
Hi @LiangZeFenglzf, You need to install additional dependencies to use PaddleOCR. You can use the following shell script to use those dependencies:
#!/usr/bin/env bash
# aarch64 requires a custom build of paddlepaddle
if [ "${ARCH}" = "aarch64" ]; then
python3 -m pip install unstructured.paddlepaddle
else
python3 -m pip install paddlepaddle
fi
python3 -m pip install unstructured.paddleocr
Also, you don't need to pass the ocr_agent
param, so
os.environ["OCR_AGENT"] = "unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle"
elements = partition_pdf(file=f, strategy='ocr_only')
Hi @LiangZeFenglzf, You need to install additional dependencies to use PaddleOCR. You can use the following shell script to use those dependencies:
#!/usr/bin/env bash # aarch64 requires a custom build of paddlepaddle if [ "${ARCH}" = "aarch64" ]; then python3 -m pip install unstructured.paddlepaddle else python3 -m pip install paddlepaddle fi python3 -m pip install unstructured.paddleocr
Also, you don't need to pass the
ocr_agent
param, soos.environ["OCR_AGENT"] = "unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle" elements = partition_pdf(file=f, strategy='ocr_only')
it doesn't work.pip list found: paddlepaddle 2.6.1
@ZephryLiang, please mention how you installed unstructured and what versions of libraries (unstructured
, unstructured-inference
) and OSS you're on (Linux, macOS).
I'm having the same issue with the latest container version I'm using,the error stack is "ImportError: /home/notebook-user/.local/lib/python3.11/site-packages/paddle/fluid/libpaddle.so: cannot open shared object file: No such file or directory" , but the libpaddle.so file is exists
Describe the bug my code: os.environ["OCR_AGENT"] = "unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle" elements = partition_pdf(file=f, ocr_agent=ocr_agent,strategy='ocr_only') error : Environment variable OCR_AGENT must be set to an existing OCR agent module, not unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle.
Expected behavior i want to extract elements from pdf, how can do this?