Azure / azure-search-vector-samples

A repository of code samples for Vector search capabilities in Azure AI Search.
https://azure.microsoft.com/products/search
MIT License
759 stars 325 forks source link

KNearestNeighborsCount - odd ranking #95

Open HB-mac opened 1 year ago

HB-mac commented 1 year ago

I'm executing a vector search against Azure Cognitive Search, using the (currently) latest version of the Azure.Search.Documents Nuget package (11.5.0-beta4).

If I execute a search with KNearestNeighborsCount set to 3, the closest match (in my opinion) is returned 3rd in the list.

However, if I do the same search with KNearestNeighborsCount set to 10 (I want to see more search results), the match I mention above is returned 8th in the list.

This doesn't make sense to me. I would have thought the match would always appear in the same position, for any value of KNearestNeighborsCount >= 3.

Q. How do I return 10 results, but have my closest match appear at position 3?

var vector = new SearchQueryVector { KNearestNeighborsCount = numNearestNeighbours, Fields = { nameof(DocumentModel.vector1) }, Value = embeddings };

var searchOptions = new SearchOptions
            {
                Vectors = { vector },
                Size = numNearestNeighbours,
                Select = { idFieldName },
};

 SearchResults<SearchDocument> response = searchClient.Search<SearchDocument>(null, searchOptions);

 var result = new List<string>();

foreach (SearchResult<SearchDocument> searchResult in response.GetResults())
            {
                var documentId = $"{searchResult.Document[idFieldName]}";
                result.Add(documentId);
}
HB-mac commented 1 year ago

A week on, I wonder if anyone can comment on this issue please?

ratdoux commented 1 year ago

Currently in the same case as you, have you found a solution?

HB-mac commented 1 year ago

No solution found, sorry

farzad528 commented 1 year ago

Hi @ratdoux and @hb-mac, we'd like to further investigate this issue. Can I request you to please issue a support request?

HB-mac commented 1 year ago

Hi @farzad528 - how do we do that? Thanks

farzad528 commented 1 year ago

https://learn.microsoft.com/en-us/azure/azure-portal/supportability/how-to-create-azure-support-request

HB-mac commented 1 year ago

Hi @farzad528 . Just wondering...if you suspect the issue is with the Azure Cognitive Search product itself, rather than the code in this Repo, why not raise the issue via your internal escalation process, rather than relying on customers, like me, to raise a support request? (It's a genuine question, and I suspect we'd get a better/quicker response if you could do that)

farzad528 commented 1 year ago

Hi @HB-mac, unfortunately, I couldn't reproduce the issue on my end using the .NET SDK using the same version and code you did, hence the reason why I encourage you to issue a support request so a dedicated support engineer can work with you in a secure and structured manner.

Some other things I suggest to try and see is if you still receive different results using the REST API? I'd like to figure out whether there is an issue using the .NET SDK vs the search engine itself. Additionally, I uploaded an updated .NET Code sample here using the latest nuget 11.5.0-beta.5. https://github.com/Azure/cognitive-search-vector-pr/blob/main/demo-dotnet/code/Program.cs

robertklee commented 1 year ago

@HB-mac can you share:

Also please confirm that your search query vector is identical for both requests.

As Farzad mentioned, if you can try using the REST API and see if the issue is present, that would also be helpful.

The reason we ask to raise a support ticket is because it will allow securely sharing any documents you may wish to share and triage it to an on-call engineer to set aside time to investigate it.

HB-mac commented 1 year ago

Hi @farzad528 @robertklee - thanks for your replies. I should be able to get that info for you on Friday.

0Dmitry commented 9 months ago

I having similar issue with nodejs sdk v12.0.0. The kNearestNeighborsCount param behaves as top. The result item count is always equal to kNearestNeighborsCount specified. Is it the intended behaviour?

robertklee commented 9 months ago

@0Dmitry

Consider the parameter top as controlling the maximum final number of results you get. The kNearestNeighborsCount controls the number of results from the vector query.

For pure single-field vector queries, they will behave the same way. Here's an example of how it works.

vector query is received -> execute vector query to produce kNearestNeighborsCount results -> using this result set, return top results.

For hybrid / cross-field vector / multi-vector queries, the results will be fused with Reciprocal Rank Fusion (RRF). This means kNearestNeighborsCount controls how many results you want to consider for the RRF re-ranking from the vector query part. Since RRF can re-order documents depending on the rank and number of result sets a document will appear in, this parameter value can change the result order/contents.

Hopefully this illustrates how each of the parameters control the number of results you receive.

https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-query?tabs=query-2023-11-01%2Cfilter-2023-11-01#number-of-ranked-results-in-a-vector-query-response

0Dmitry commented 9 months ago

@robertklee I got it now, thanks for the reply!