Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.8k stars 626 forks source link

pptx initial error #3082

Closed OtokoNoIzumi closed 1 month ago

OtokoNoIzumi commented 2 months ago

Describe the bug when import pptx accure error (version 0.14.2)

To Reproduce demo code from deeplearning: from unstructured.partition.pptx import partition_pptx

Expected behavior just clear

Screenshots

ModuleNotFoundError Traceback (most recent call last) Cell In[2], line 10 7 from unstructured_client.models.errors import SDKError 9 from unstructured.partition.html import partition_html ---> 10 from unstructured.partition.pptx import partition_pptx 11 from unstructured.staging.base import dict_to_elements, elements_to_json

File ~\anaconda3\envs\workspace\Lib\site-packages\unstructured\partition\pptx.py:13 10 from tempfile import SpooledTemporaryFile 11 from typing import IO, Any, Iterator, Optional, Protocol, Sequence ---> 13 import pptx 14 from pptx.presentation import Presentation 15 from pptx.shapes.autoshape import Shape

ModuleNotFoundError: No module named 'pptx'

Environment Info Please run python scripts/collect_env.py and paste the output here. This will help us understand more about the environment in which the bug occurred.

Additional context Add any other context about the problem here.

MthwRobinson commented 2 months ago

Hi @OtokoNoIzumi - did you install the package with pip install "unstructured[pptx]"? That would install the missing dependency.