Closed ron-unstructured closed 9 months ago
It’s important that the api can return images to match the library. The best approach is likely b64 encoded files returned in a metadata field. So, we'll need to:
New plan - add this functionality to partition
. The api just needs to pass the right parameter down.
Note that additional elements beyond Image
such as Table
per https://github.com/Unstructured-IO/unstructured/pull/2229 may now be extracted as images, so this should include that functionality as well.
Multimodal RAG is becoming hot topic so I would love to see this implemented officially.
So far my workaround is to spin up custom FastAPI that returns Image element and adds the base64 represenation to them. I can submit a PR to this repo if it's welcomed.
A PR would always be welcome! Our current plan for this is to add an option to partition
to return the images directly. This should mean the lift on the api side is just passing the parameter along. If this is what your workaround is doing, happy to review in the unstructured
repo.
Description This parameter is available in the
partition_pdf
but not through the API. With the new GPT-4V multimodal model, extracting images from source documents will be helpful.To Reproduce