Closed Tilemachoc closed 1 month ago
@Tilemachoc You'll need to install PDF extras:
pip install unstructured[pdf]
https://docs.unstructured.io/open-source/installation/full-installation
Closing as assumed resolved, but feel free to reopen if you're still having trouble :)
TEST CODE:
import langchain import os from unstructured.partition.pdf import partition_pdf from unstructured.staging.base import elements_from_json
filename = "file.pdf"
elements = partition_pdf( filename=filename, strategy="hi_res", infer_table_structure=True, model_name="yolox" )
print(elements) for elem in elements: print("------") print(elem.metadata.text_as_html)
ERROR:
line 5, in
import unstructured.partition.pdf
ModuleNotFoundError: No module named 'pdfminer