Azure / azure-search-vector-samples

A repository of code samples for Vector search capabilities in Azure AI Search.
https://azure.microsoft.com/products/search
MIT License
690 stars 285 forks source link

Vector search with HTML text and images together #208

Open MatteoAntonini opened 3 months ago

MatteoAntonini commented 3 months ago

I've seen several examples in the repository with separate text and images, but if I wanted to perform a RAG on HTMLs containing both text and images, what would be the best approach? Currently, I've only tried with the text present in the HTML, creating a vector with a content field containing the HTML text. In particular, I'm not sure whether to include images beforehand and create a single skillset, or keep them separate and then merge the two indexes for text and images.

farzad528 commented 3 months ago

Hey @MatteoAntonini thanks for flagging this question! When working with HTML content that includes both text and images, you need to handle text and images differently because they are inherently different types of data. Here are some things you could do:

I think regardless of the approach you choose, it's important to consider the relevance of the images in the HTML for your specific use case. If the images are highly relevant to the search query, it may be beneficial to include them in the RAG analysis. However, if the images are not relevant, they may add noise to the analysis and reduce the overall relevance of the search results.

Hope this helps!