Closed pawel-kmiecik closed 5 days ago
Your pull request is modifying functions with the following pre-existing issues:
📄 File: unstructured/partition/pdf.py
Function | Unhandled Issue |
---|---|
_partition_pdf_or_image_local |
IndexError: list index out of range /general/v0/g... Event Count: 12 |
Did you find this useful? React with a 👍 or 👎
Drawing bboxes for OCR layout doesn't appear to be working.
PDF: 2023_SustainabilityReport_33.pdf
elements = partition_pdf( filename="2023_SustainabilityReport_33.pdf", strategy=strategy, analysis=True, )
Results:
This should be fixed now:
This PR adds new capabilities for drawing bboxes for each layout (extracted, inferred, ocr and final) + OD model output dump as a json file for better analysis.