The source for REST API specifications for Microsoft Azure.
MIT License
2.69k
stars
5.12k
forks
source link
What is the @search.score cutoff under which no results appear ? I am getting lesser number of rows than specified in the top parameter of .search #25319
I have created an azure search index with the below dataframe
. Scenario 1: search_client.search('stand-up',top=3) gives me all 3 rows from the index in the results,
but Scenario 2: search_client.search('What do comics do?',top=3) only gives me 1 result. (Images at the end of the question)
My question : Why is the search method not returning all the 3 rows in my Scenario 2 in spite of me specifying top=3. Is there a threshold of @search.score that needs to be met for a row in order to be returned ? If yes, Can this threshold be controlled as a parameter in .search method?
I have already been through the method's source code and don't see any such parameter
.
Return for Scenario 1
.
Return for Scenario 2
.
.
Below is the full code to reproduce this issue
AZURE_SEARCH_SERVICE = 'to be filled as str'
AZURE_SEARCH_KEY = 'to be filled as str'
from azure.search.documents.indexes import SearchIndexClient
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes.models import *
from azure.search.documents import SearchClient
import pandas as pd
from uuid import uuid4
from azure.search.documents.models import QueryType, Vector
def create_search_index(index_name:str)->None:
index_client = SearchIndexClient(endpoint=f"https://{AZURE_SEARCH_SERVICE}.search.windows.net/",
credential=AzureKeyCredential(AZURE_SEARCH_KEY))
index = SearchIndex(
name=index_name,
fields=[
SimpleField(name="uuid", type="Edm.String", key=True),
SimpleField(name="Numb_Str", type="Edm.String", filterable=True, facetable=True),
SearchableField(name="Sent", type="Edm.String", analyzer_name="en.microsoft"),
SimpleField(name="Topic", type="Edm.String", filterable=True, facetable=True),
],
semantic_settings=SemanticSettings(
configurations=[SemanticConfiguration(
name='default',
prioritized_fields=PrioritizedFields(
title_field=None, prioritized_content_fields=[SemanticField(field_name='Sent')]))])
)
print(f"Creating {index} search index")
index_client.create_index(index)
def upload_to_created_index(index_name:str,df:pd.DataFrame)->None:
search_client = SearchClient(endpoint=f"https://{AZURE_SEARCH_SERVICE}.search.windows.net/",
index_name=index_name,
credential=AzureKeyCredential(AZURE_SEARCH_KEY))
sections = df.to_dict("records")
search_client.upload_documents(documents=sections)
#create df for uploading to search index
data = [{'uuid':str(uuid4()),'Numb_Str':'10','Sent':'Stand-up comedy is a comedic performance to a live audience in which the performer addresses the audience directly from the stage','Topic':'Standup'},
{'uuid':str(uuid4()),'Numb_Str':'20','Sent':'A stand-up defines their craft through the development of the routine or set','Topic':'Standup'},
{'uuid':str(uuid4()),'Numb_Str':'30', 'Sent':'Experienced stand-up comics with a popular following may produce a special.','Topic':'Standup'}]
df = pd.DataFrame(data)
pd.set_option('display.max_colwidth', None)
#create empty search index
create_search_index("test-simple2")
#upload df to created search index
upload_to_created_index('test-simple2',df)
#query the search index
search_client = SearchClient(
endpoint=f"https://{AZURE_SEARCH_SERVICE}.search.windows.net",
index_name='test-simple2',
credential=AzureKeyCredential(AZURE_SEARCH_KEY))
query_results = search_client.search('What do comics do?',top=3)
query_results = list(query_results)
#get query results in a df
df_results = pd.DataFrame(query_results)
df_results
.
If I try changing the .search method's args to make it do a semantic search , I still get 1 result. I do it with the below
query_results = search_client.search('What do comics do?',
top=3,
query_type=QueryType.SEMANTIC,
query_language='en-us',
semantic_configuration_name="default")
I have created an azure search index with the below dataframe
.
Scenario 1:
search_client.search('stand-up',top=3)
gives me all 3 rows from the index in the results,but
Scenario 2:
search_client.search('What do comics do?',top=3)
only gives me 1 result. (Images at the end of the question)My question : Why is the search method not returning all the 3 rows in my Scenario 2 in spite of me specifying top=3. Is there a threshold of
@search.score
that needs to be met for a row in order to be returned ? If yes, Can this threshold be controlled as a parameter in .search method?I have already been through the method's source code and don't see any such parameter
.
Return for Scenario 1
.
Return for Scenario 2
.
.
Below is the full code to reproduce this issue
.
If I try changing the .search method's args to make it do a semantic search , I still get 1 result. I do it with the below