Azure / azure-rest-api-specs

The source for REST API specifications for Microsoft Azure.
MIT License
2.69k stars 5.12k forks source link

What is the @search.score cutoff under which no results appear ? I am getting lesser number of rows than specified in the top parameter of .search #25319

Open gitprojects619 opened 1 year ago

gitprojects619 commented 1 year ago

I have created an azure search index with the below dataframe
df for search index

.
Scenario 1: search_client.search('stand-up',top=3) gives me all 3 rows from the index in the results,
but
Scenario 2: search_client.search('What do comics do?',top=3) only gives me 1 result. (Images at the end of the question)

My question : Why is the search method not returning all the 3 rows in my Scenario 2 in spite of me specifying top=3. Is there a threshold of @search.score that needs to be met for a row in order to be returned ? If yes, Can this threshold be controlled as a parameter in .search method?

I have already been through the method's source code and don't see any such parameter

.
Return for Scenario 1
Return for **Scenario 1**
.
Return for Scenario 2
enter image description here .
.
Below is the full code to reproduce this issue

AZURE_SEARCH_SERVICE = 'to be filled as str'
AZURE_SEARCH_KEY = 'to be filled as str'

from azure.search.documents.indexes import SearchIndexClient
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes.models import *
from azure.search.documents import SearchClient
import pandas as pd
from uuid import uuid4
from azure.search.documents.models import QueryType, Vector

def create_search_index(index_name:str)->None:

    index_client = SearchIndexClient(endpoint=f"https://{AZURE_SEARCH_SERVICE}.search.windows.net/",
                                     credential=AzureKeyCredential(AZURE_SEARCH_KEY))

    index = SearchIndex(
        name=index_name,
        fields=[
            SimpleField(name="uuid", type="Edm.String", key=True),
            SimpleField(name="Numb_Str", type="Edm.String", filterable=True, facetable=True),
            SearchableField(name="Sent", type="Edm.String", analyzer_name="en.microsoft"),
            SimpleField(name="Topic", type="Edm.String", filterable=True, facetable=True),
        ],
        semantic_settings=SemanticSettings(
            configurations=[SemanticConfiguration(
                name='default',
                prioritized_fields=PrioritizedFields(
                    title_field=None, prioritized_content_fields=[SemanticField(field_name='Sent')]))])
    )
    print(f"Creating {index} search index")
    index_client.create_index(index)

def upload_to_created_index(index_name:str,df:pd.DataFrame)->None:

    search_client = SearchClient(endpoint=f"https://{AZURE_SEARCH_SERVICE}.search.windows.net/",
                                 index_name=index_name,
                                 credential=AzureKeyCredential(AZURE_SEARCH_KEY))
    sections = df.to_dict("records")
    search_client.upload_documents(documents=sections)

#create df for uploading to search index
data = [{'uuid':str(uuid4()),'Numb_Str':'10','Sent':'Stand-up comedy is a comedic performance to a live audience in which the performer addresses the audience directly from the stage','Topic':'Standup'},
        {'uuid':str(uuid4()),'Numb_Str':'20','Sent':'A stand-up defines their craft through the development of the routine or set','Topic':'Standup'},
        {'uuid':str(uuid4()),'Numb_Str':'30', 'Sent':'Experienced stand-up comics with a popular following may produce a special.','Topic':'Standup'}]

df = pd.DataFrame(data)
pd.set_option('display.max_colwidth', None)

#create empty search index
create_search_index("test-simple2")

#upload df to created search index
upload_to_created_index('test-simple2',df)

#query the search index
search_client = SearchClient(

            endpoint=f"https://{AZURE_SEARCH_SERVICE}.search.windows.net",
            index_name='test-simple2',
            credential=AzureKeyCredential(AZURE_SEARCH_KEY))

query_results = search_client.search('What do comics do?',top=3)
query_results = list(query_results)

#get query results in a df
df_results = pd.DataFrame(query_results)

df_results

.

If I try changing the .search method's args to make it do a semantic search , I still get 1 result. I do it with the below

query_results = search_client.search('What do comics do?',
                                     top=3,
                                     query_type=QueryType.SEMANTIC,
                                     query_language='en-us',
                                     semantic_configuration_name="default")
navba-MSFT commented 1 year ago

Adding service team to look into this.