Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
5.57k stars 3.74k forks source link

Making inline images embeddable and searchable #1724

Open emreonal12 opened 1 week ago

emreonal12 commented 1 week ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [X] feature request
- [X] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

I want to use the GPT-4 Turbo with vision functionality to embed and index both text and (inline) images from PDF files for subsequent searching.

The docs here (https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/gpt4v.md) describe the general pipeline, mentioning that:

It is unclear to me whether the entire page PNG is being embedded, or if this refers to embedding just the inline images that have been extracted from the PDF/page-PNG (e.g. inline figures/charts). If inline images are embedded (instead of the whole page), how does the Azure OCR tool detect them and separate from the unstructured text? Is there such a feature offered?

Any clarifications would be greatly appreciated.

pamelafox commented 1 week ago

It only embeds entire pages right now. I'm not sure if Document Intelligence also extracts images separately. If it does, you could potentially do what you're saying.