Unstructured-IO / unstructured-js-client

A Typescript client for the Unstructured hosted API
MIT License
40 stars 12 forks source link

PDF files without a filename cannot be split #100

Open awalker4 opened 3 months ago

awalker4 commented 3 months ago

See this comment. If a pdf file does not have .pdf in the filename, we return the message Given file is not a PDF. Continuing without splitting.. The issue is that loadPdf here should not return as soon as the file extension check fails. Let's try to load the file in pdf-lib, and only return if that fails.

Ziao commented 2 months ago

Same thing happens when passing in a blob, which obviously has no filename. Shouldn't the library simply use the mimetype or trust that the passed contentType is correct?

awalker4 commented 1 month ago

Yes, apologies for the delay here! This will be addressed in #115