Closed rs-03 closed 3 days ago
Hi @rs-03, Addressed on https://github.com/Unstructured-IO/unstructured-inference/pull/359. You'll need to upgrade unstructured-inference
to 0.7.36.
Thanks @christinestraub. Upgrading unstructured-inference to 0.7.36 fixed my issue.
Describe the bug File /opt/conda/lib/python3.12/site-packages/unstructured_inference/models/tables.py:667, in fill_cells(cells) 650 def fill_cells(cells: List[dict]) -> List[dict]: 651 """fills the missing cells in the table by adding a cells with empty text 652 where there are no cells detected by the model. 653 (...) 665 666 """ --> 667 table_rows_no = max({row for cell in cells for row in cell["row_nums"]}) 668 table_cols_no = max({col for cell in cells for col in cell["column_nums"]}) 669 filled = np.zeros((table_rows_no + 1, table_cols_no + 1), dtype=bool)
ValueError: max() iterable argument is emptyTo Reproduce from unstructured.partition.pdf import partition_pdf raw_pdf_elements = partition_pdf( filename=path + "/file_name.pdf",
Unstructured first finds embedded image blocks
) Expected behavior Text and Table elements should have been extracted
Screenshots If applicable, add screenshots to help explain your problem.
Environment Info OS version: Linux-6.1.92-99.174.amzn2023.x86_64-x86_64-with-glibc2.35 Python version: 3.12.3 unstructured version: 0.14.7 unstructured-inference version: 0.7.35 pytesseract version: 0.3.10 Torch version: 2.3.1 Detectron2 version: None PaddleOCR version: None Libmagic version: file-5.41 magic file from /etc/magic:/usr/share/misc/magic LibreOffice version: LibreOffice 7.3.7.2 30(Build:2)