Azure / azure-search-vector-samples

A repository of code samples for Vector search capabilities in Azure AI Search.
https://azure.microsoft.com/products/search
MIT License
739 stars 315 forks source link

Different similarity results with text-embedding-3-small vs ada-002 #166

Closed kirk-marple closed 7 months ago

kirk-marple commented 7 months ago

I've tried using the new text-embedding-3-small OpenAI model to create embeddings, and I'm seeing rather different results from a vector search.

It doesn't give back the relevant text chunks that I'd expect, compared to ada-002.

Also, somewhat strangely, the relevance values are noticeably different. With ada-002, the top hits are above 0.8, but with text-embedding-3-small, the top hits are only above 0.6.

I'm using the defaults for HNSW, and cosine similarity.

Not sure why this could be; has anyone seen this difference before?

Maybe this is an OpenAI issue with the new embedding model, but I wanted to bring it up, if it has anything to do with the vector search.

kirk-marple commented 7 months ago

Actually, this looks like an issue with the embedding model. Closing this issue.

https://community.openai.com/t/ive-gone-back-to-ada-2-text-embedding-3-small-is-not-working-for-me/616216