Azure / azure-search-vector-samples

A repository of code samples for Vector search capabilities in Azure AI Search.
https://azure.microsoft.com/products/search
MIT License
690 stars 285 forks source link

Azure Vector Search not working with OpenAI new embedding model "text-embedding-3-large" #157

Closed AVIN8233 closed 4 months ago

AVIN8233 commented 4 months ago

Hi Team, I found the Azure Vectorstore is not working with OpenAI new embedding model "text-embedding-3-large" where the length of the embedding vector is 3072.

Python code: from langchain_community.vectorstores.azuresearch import ( AzureSearch, AzureSearchVectorStoreRetriever, ) model = "text-embedding-3-large" embeddings = OpenAIEmbeddings(deployment=model, model=model,dimensions=3072) vector_store: AzureSearch = AzureSearch( azure_search_endpoint=AZURE_SEARCH_ENDPOINT, azure_search_key=AZURE_SEARCH_KEY, index_name=index_name, embedding_function=embeddings.embed_query, )

Error: File ~\anaconda3\Lib\site-packages\azure\search\documents\indexes_generated\operations_indexes_operations.py:403, in IndexesOperations.create(self, index, request_options, **kwargs) 401 map_error(status_code=response.status_code, response=response, error_map=error_map) 402 error = self._deserialize.failsafe_deserialize(_models.SearchError, pipeline_response) --> 403 raise HttpResponseError(response=response, model=error) 405 deserialized = self._deserialize("SearchIndex", pipeline_response) 407 if cls:

HttpResponseError: (InvalidRequestParameter) The request is invalid. Details: definition : The vector field 'content_vector' must have the property 'dimensions' set to a positive, non-zero integer between 2 and 2048. Code: InvalidRequestParameter Message: The request is invalid. Details: definition : The vector field 'content_vector' must have the property 'dimensions' set to a positive, non-zero integer between 2 and 2048. Exception Details: (InvalidField) The vector field 'content_vector' must have the property 'dimensions' set to a positive, non-zero integer between 2 and 2048. Parameters: definition Code: InvalidField Message: The vector field 'content_vector' must have the property 'dimensions' set to a positive, non-zero integer between 2 and 2048. Parameters: definition

Comment: I checked that the langchain supports the length of the embedding vector 3072. Can you please check in azure?

mattgotteiner commented 4 months ago

Yes, we don't support vectors of this length yet unfortunately. You would need to use the dimensions property to reduce it below 2048.

farzad528 commented 4 months ago

We are rolling out a fix that will have a max dimension limit of 3072 and should be complete by end of Feb. Thanks.

AVIN8233 commented 4 months ago

I'm waiting for the fix. Please update this thread when the rollout is complete.

farzad528 commented 4 months ago

Hi, it should be available globally by now! @AVIN8233

AVIN8233 commented 4 months ago

Thanks Farzad, it is working now.

AVIN8233 commented 4 months ago

it is fixed now

jucastag commented 4 months ago

we are using microsoft gpt-rag repository and cognitive search doesnt allow us to index dimension higher than 2048. Also when trying to create the vectorizer while setting the contentVector field.. it only allow us to choose the ada-002 model. We have a text-embedding-3-large deployed but it doesnt appear as an option. @farzad528

farzad528 commented 4 months ago

Hi @jucastag, can you link the repo you are referring to? I'll try and see how I can help but at a minimum you should flag an issue.

jucastag commented 4 months ago

Thank you @farzad528, heres the repo: https://github.com/Azure/GPT-RAG. will flag an issue there also. I post it here because I was looking to solve the issue and came up with this thread. Thank you again