Unstructured-IO / unstructured-api

Apache License 2.0
528 stars 110 forks source link

FR/add partition option to return pdf images and pass it through the api #306

Closed ron-unstructured closed 9 months ago

ron-unstructured commented 11 months ago

Description This parameter is available in the partition_pdf but not through the API. With the new GPT-4V multimodal model, extracting images from source documents will be helpful.

To Reproduce

from unstructured.partition.pdf import partition_pdf partition_pdf(filename, strategy="hi_res", extract_images_in_pdf=True)

awalker4 commented 10 months ago

It’s important that the api can return images to match the library. The best approach is likely b64 encoded files returned in a metadata field. So, we'll need to:

New plan - add this functionality to partition. The api just needs to pass the right parameter down.

cragwolfe commented 10 months ago

Note that additional elements beyond Image such as Table per https://github.com/Unstructured-IO/unstructured/pull/2229 may now be extracted as images, so this should include that functionality as well.

faileon commented 10 months ago

Multimodal RAG is becoming hot topic so I would love to see this implemented officially.

So far my workaround is to spin up custom FastAPI that returns Image element and adds the base64 represenation to them. I can submit a PR to this repo if it's welcomed.

awalker4 commented 10 months ago

A PR would always be welcome! Our current plan for this is to add an option to partition to return the images directly. This should mean the lift on the api side is just passing the parameter along. If this is what your workaround is doing, happy to review in the unstructured repo.