Unstructured-IO / unstructured-api

Apache License 2.0
528 stars 110 forks source link

Issue parsing a certain pdf file #279

Closed jashdalvi closed 1 year ago

jashdalvi commented 1 year ago

I am getting this error while parsing a certain pdf file: "unsupported operand type(s) for -: 'float' and 'NoneType'"

Here is the pdf file for reference doc_ZUcPhd7XkxOsfB.pdf

I tried with the unstructured api but the same issue persists.

awalker4 commented 1 year ago

Hi there, thanks for the reproducer! We're tracking over here, and I believe it's fixed in the latest release, we just need to verify.

awalker4 commented 1 year ago

Will close this and link in the other issue.

awalker4 commented 1 year ago

Confirmed that this file works in unstructured 0.10.20 and the latest api release 0.0.52. This will be in the hosted api within the hour.