Open Arslan-Mehmood1 opened 1 month ago
What version that is still working? Thank you @Arslan-Mehmood1
What versions of unstructured
and unstructured-inference
libraries are you using?
I've got the same error with unstructured@0.15.14
and unstructured-inference@0.7.40
The problem occurs from unstructured-inference@0.7.37
.
With unstructured-inference@0.7.36
and below everything works.
Snippet of code in which error occurs:
from unstructured.partition.pdf import partition_pdf
filename = "your_pdf.pdf"
elements = partition_pdf(
filename=filename,
strategy="hi_res",
infer_table_structure=True,
model_name="yolox"
)
I've got the same error with
unstructured@0.15.14
andunstructured-inference@0.7.40
i have same issue with same condition
What version that is still working? Thank you @Arslan-Mehmood1 @amysudarat amysudarat
unstructured==0.15.10 unstructured_inference==0.7.36
Actually, the problem was with unstructured_inference. I traced back to the time the code worked for me and then I check the available version of unstructured_inference that was released around that time.
What versions of
unstructured
andunstructured-inference
libraries are you using? @christinestraub Code working with following: unstructured==0.15.10 unstructured_inference==0.7.36
I've got the same error with
unstructured@0.15.14
andunstructured-inference@0.7.40
@theogiraudon
Code working with following: unstructured==0.15.10 unstructured_inference==0.7.36
The problem occurs from
unstructured-inference@0.7.37
. Withunstructured-inference@0.7.36
and below everything works.Snippet of code in which error occurs:
from unstructured.partition.pdf import partition_pdf filename = "your_pdf.pdf" elements = partition_pdf( filename=filename, strategy="hi_res", infer_table_structure=True, model_name="yolox" )
@Qwedon Code working with following: unstructured==0.15.10 unstructured_inference==0.7.36
I've got the same error with
unstructured@0.15.14
andunstructured-inference@0.7.40
i have same issue with same condition
@chung-codes Code working with following: unstructured==0.15.10 unstructured_inference==0.7.36
i faced the same issue with the following versions. unstructured==0.15.14 unstructured-inference==0.7.41
Reverted it back to which is working. unstructured==0.15.10 unstructured-inference==0.7.36
I have same issue with same condition:
Problem occurs with following: unstructured==0.15.14 unstructured_inference==0.8.0
Code worked in : unstructured==0.15.9 unstructured_inference==0.7.36
I'm getting AttributeError: 'list' object has no attribute 'element_coords'
for
partitioned = partition_pdf(
filename=file,
strategy=self.args.doc_load_strategy,
extract_images_in_pdf=False,
languages=["deu", "eng"],
password="",
kwargs={
"check_extractable": False
}
)
My versions:
[[package]]
name = "unstructured"
version = "0.14.10"
[[package]]
name = "unstructured-inference"
version = "0.8.1"
I was using 0.14.10
because of another issue, so I'm upgrading to 0.16.5 to see if it helps.
I am having problems that I didn't have two weeks ago.
Floating text misclassification: Floating text elements are being incorrectly recognized and classified as tables. element['type']
is parsed incorrectly. This is a new behavior that didn't occur in previous versions of Unstructured.
Floating text merging: Floating text (such as headers or side notes) is not being processed as individual elements. Instead, it's being merged into the same paragraph with nearby content. As you can see in attached image. unstructured ==0.16.3 unstructured-client == 0.26.1 unstructured-inference == 0.8.1 unstructured-ingest == 0.1.1 unstructured.pytesseract == 0.3.13
Parsed document, where the floating text is underlined: