Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.5k stars 585 forks source link

bug/Import Error for unstructured.partition.html #2894

Closed SuvroBaner closed 2 months ago

SuvroBaner commented 2 months ago

Describe the bug When I am importing the modules as below, I am getting the following error-

from unstructured.partition.html import partition_html
from unstructured.partition.pptx import partition_pptx

TypeError: add_chunking_strategy() missing 1 required positional argument: 'func'

To Reproduce Python 3.9.6 Mac Apple M2 MacOS Sonoma version 14.4.1

!pip install "unstructured[all-docs]" !brew install libmagic !brew install libxml2 !brew install libxslt

Expected behavior The module should get imported.

Screenshots If applicable, add screenshots to help explain your problem.

Environment Info Please run python scripts/collect_env.py and paste the output here. This will help us understand more about the environment in which the bug occurred. The requirements.txt - `chromadb==0.4.22 langchain==0.1.5 langchain-community==0.0.17 langchain-core==0.1.19 langchain-openai==0.0.5 openai==1.11.1 tiktoken==0.5.2

"unstructured[md,pdf,pptx]"

unstructured-client==0.16.0 unstructured==0.12.3 unstructured-inference==0.7.23 unstructured.pytesseract==0.3.12 urllib3==1.26.18 python-dotenv==1.0.1 panel==1.3.8 ipython==8.18.1 python-pptx==0.6.23 pdf2image==1.17.0 pdfminer==20191125 opencv-python==4.9.0.80

pikepdf==8.13.0 pypdf==4.0.1`

Additional context Add any other context about the problem here.

christinestraub commented 2 months ago

Hi @SuvroBaner Did you try the latest unstructured library version, 0.13.2?

scanny commented 2 months ago

@SuvroBaner also, can you paste in the full stack trace?

scanny commented 2 months ago

Closing as resolved. @SuvroBaner feel free to reopen if you're still having trouble.