A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
- [ ] bug report -> please search issues before submitting
- [X] feature request
- [X] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
I want to use the GPT-4 Turbo with vision functionality to embed and index both text and (inline) images from PDF files for subsequent searching.
It is unclear to me whether the entire page PNG is being embedded, or if this refers to embedding just the inline images that have been extracted from the PDF/page-PNG (e.g. inline figures/charts). If inline images are embedded (instead of the whole page), how does the Azure OCR tool detect them and separate from the unstructured text? Is there such a feature offered?
It only embeds entire pages right now. I'm not sure if Document Intelligence also extracts images separately. If it does, you could potentially do what you're saying.
This issue is for a: (mark with an
x
)I want to use the GPT-4 Turbo with vision functionality to embed and index both text and (inline) images from PDF files for subsequent searching.
The docs here (https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/gpt4v.md) describe the general pipeline, mentioning that:
It is unclear to me whether the entire page PNG is being embedded, or if this refers to embedding just the inline images that have been extracted from the PDF/page-PNG (e.g. inline figures/charts). If inline images are embedded (instead of the whole page), how does the Azure OCR tool detect them and separate from the unstructured text? Is there such a feature offered?
Any clarifications would be greatly appreciated.