Azure / azure-search-vector-samples

A repository of code samples for Vector search capabilities in Azure AI Search.
https://azure.microsoft.com/products/search
MIT License
690 stars 285 forks source link

Azure Hybrid Search results are not consistent #209

Open AVIN8233 opened 3 months ago

AVIN8233 commented 3 months ago

Hi team,

I am running Azure Hybrid Search on my data which has 12 pdfs[423 chunks], that I am embedding in my vector store, and getting top 12 chunks for a query.

Code snippet: vector_query = VectorizedQuery( vector=query_embeddings, k_nearest_neighbors=60, fields="contentVector")

    results = await self.search_client.search(
        search_text=query,
        vector_queries=[vector_query],
        top=12,
        filter=filter_expression
    )

    but the problem I am facing is, the top 12 results are not consistent, and they are changing with different iterations. To solve this I used exhaustiveKNN as well, but it didn't help. Upon reading Azure blogs, I found that some stochasticity may come from BM25, so I set the parameter scoring_statistics='global' & also added session_id.

    Code snippet:
    vector_query = VectorizedQuery(
        vector=query_embeddings, k_nearest_neighbors=60,
        fields="contentVector", exhaustive=True)

    results = await self.search_client.search(
        search_text=query,
        vector_queries=[vector_query],
        top=12,
        filter=filter_expression, scoring_statistics='global'
    ) #session_id = 'abcd1234xyz',
    results = await self._format_metadata(results)

I request the team to please guide me how to get the same consistent output from Hybrid Search, given that I want to optimize based on Search Accuracy[how relevant chunks are] and time to embed and retrieve?

farzad528 commented 3 months ago

@AVIN8233 can you execute the two queries independently (BM25 and Vector Search) and see if the order is consistent? also, how many replicas do you have on your search service?