Unstructured-IO / unstructured-api

Apache License 2.0
528 stars 110 forks source link

ModuleNotFoundError: No module named 'unstructured.partition.utils.ocr_models' #286

Closed jashdalvi closed 1 year ago

jashdalvi commented 1 year ago

I used the latest pull from the unstructured api repo. This is specific to using paddle for ocr and also on GPU. Then these are the steps I followed:

  1. make install
  2. pip install onnxruntime-gpu
  3. pip install paddlepaddle-gpu
  4. pip install "unstructured.PaddleOCR"
  5. export ENTIRE_PAGE_OCR=paddle
  6. export TABLE_OCR=paddle
  7. make run-web-app

This was working fine with 0.0.47 version

crapthings commented 11 months ago

how to get paddle working?

export ENTIRE_PAGE_OCR=paddle
export TABLE_OCR=paddle

request failed with

{
    "detail": "tesseract is not installed or it's not in your PATH. See README file for more information."
}
yuming-long commented 11 months ago

Hi @crapthings thanks for reaching out!

Sorry about the confusion, environment variable ENTIRE_PAGE_OCR and TABLE_OCR are being deprecated.

To make sure paddle is working, you might need to: