Different similarity results with text-embedding-3-small vs ada-002

Azure / azure-search-vector-samples

A repository of code samples for Vector search capabilities in Azure AI Search.

MIT License

739 stars 315 forks source link

I've tried using the new text-embedding-3-small OpenAI model to create embeddings, and I'm seeing rather different results from a vector search.

It doesn't give back the relevant text chunks that I'd expect, compared to ada-002.

Also, somewhat strangely, the relevance values are noticeably different. With ada-002, the top hits are above 0.8, but with text-embedding-3-small, the top hits are only above 0.6.

I'm using the defaults for HNSW, and cosine similarity.

Not sure why this could be; has anyone seen this difference before?

Maybe this is an OpenAI issue with the new embedding model, but I wanted to bring it up, if it has anything to do with the vector search.

Azure / azure-search-vector-samples

Different similarity results with text-embedding-3-small vs ada-002 #166