Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
9.04k stars 746 forks source link

bug/wrong import #1763

Closed kavya98 closed 1 year ago

kavya98 commented 1 year ago

Describe the bug When using Unstructured with Langchain, the following is giving an import error:

To Reproduce loader = UnstructuredPDFLoader('pdf_path', mode='elements', strategy='fast')

Expected behavior No error

Additional context This is because of this line of code -> line which causes error

PDFResourceManager is not present in pdfminer.converter, but present in pdfminer.pdfinterp, so the code should be changed to: from pdfminer.pdfinterp import PDFResourceManager

Coniferish commented 1 year ago

Hi @kavya98! Can you provide the document that's raising this error? I'm having trouble reproducing it. Also, what IDE are you using?