Closed vamsibanda closed 7 months ago
@vamsibanda Have you tried looping through the response.results
? The results are paginated, and looping accesses all pages.
Example:
for result in response.results:
print(result)
Also, what is the value of response.totalSize
?
The estimated total count of matched items irrespective of pagination. The count of results returned by pagination may be less than the totalSize that matches.
@holtskinner I tried looping with the above search query. The API returned the first set of results, which comprised 25 results, fewer than the specified page_size=50
. It also provided metadata indicating a total size of 3740. How can I retrieve the next set of 25 results?
Also, if I aim to obtain another 25 results, will this be considered a separate search request, and will I be charged for two requests?
@holtskinner is there any update on this? Particularly I'm interested on if there is any documentation covering why page_size seems to be ignored for values over 25 when the maximum page_size indicated in the documentation is 100 (link). I can iterate over the SearchPager to get more results, but this is causing additional calls to the API (and additional charges) when it seems I should be retrieving those results in a single call.
Here is a reproducible example of the api returning only 25 results when requesting 50.
import requests
import json
import os
bearer_token = os.popen("gcloud auth print-access-token").read().strip()
PROJECT_ID = "project-id"
LOCATION = "global"
DATA_STORE_ID = "datastore-id"
SERVING_CONFIG = f"projects/{PROJECT_ID}/locations/{LOCATION}/collections/default_collection/dataStores/{DATA_STORE_ID}/servingConfigs/default_serving_config"
ENDPOINT_URL = f"https://discoveryengine.googleapis.com/v1/{SERVING_CONFIG}:search"
query = "test"
page_size = 50
headers = {'Content-Type': 'application/json', "Authorization": f"Bearer {bearer_token}"}
body = {
"query": query,
"pageSize": page_size,
}
response = requests.post(ENDPOINT_URL, headers=headers, data=json.dumps(body)).json()
print(len(response["results"]))
# 25
The documentation for this endpoint says the maximum count of results is 100, which seems to be contrary to the behavior seen here.
pageSize
integerMaximum number of Documents to return. If unspecified, defaults to a reasonable value. The maximum allowed value is 100. Values above 100 are coerced to 100.
If this field is negative, an
INVALID_ARGUMENT
is returned.
@holtskinner do you have any updates on this or any proposed workarounds?
@lavinigam-gcp @holtskinner apologies for the ping, but is there anyone from GCP who is following up on this / any update on the issue? It seems like a pretty clear deviation in real vs documented behavior which is having a major impact on my project.
Hi, sorry for the delay. I've reported this to the product dev team, and I'll update here when there are any updates.
@johnmccain Confirmed with the product team that this is the expected behavior.
Website (Basic Indexing) data stores have a maximum page size of 25, Website (Advanced Indexing) data stores have a maximum page size of 50, and unstructured/structured data stores have a maximum page size of 100.
The documentation is incorrect. I'm working on getting that updated.
I updated the documentation here:
This page will be updated soon:
I am currently using Google Vertex AI Website Search capabilities in my application and have encountered an issue: the search functionality often does not return more than 25 results, despite the page_size parameter being set to 50. This occurs regardless of the query used. This limitation is significantly impacting the usability of our search feature.
Could you please assist in resolving this issue? Any guidance or support in this matter would be greatly appreciated.