When using the new text-embedding-3 models the scores are a lot lower

ReneReiterer commented 1 week ago

Hey, when i try to use the new text-embedding-3 models for creating the embeddings and for querying, i get a lot lower scores for the same query.

with ada-2, a query could get a result score of 0.8, but with text-embedding-3 it goes below 0.5, but returns the same content. Is there a reason for this?

Stevenic commented 3 days ago

That's a function of the embeddings model and nothing I have control over. It implies that they're generating a more diverse range of embeddings... Can you share some examples (query + text being compared to)

ReneReiterer commented 3 days ago

Here is an example using the example from the readme of vectra:

with "text-embedding-ada-002":

Querying green... [0.9027890493383421] blue [0.8750171543194056] red [0.8316836924030466] apple

Querying banana... [0.9025824326098169] apple [0.8489727589250824] oranges [0.840552337334082] blue

with "text-embedding-3-small":

Querying green... [0.5587630540517711] blue [0.4586459570036867] red [0.3330212746409029] oranges

Querying banana... [0.463723740085403] apple [0.36792568686955635] oranges [0.3011467689281706] blue

with "text-embedding-3-large":

Querying green... [0.5854194924173858] red [0.5425629350657741] blue [0.3589804053636035] oranges

Querying banana... [0.4618476040380141] apple [0.39727599664880175] oranges [0.37006686089236474] blue

Stevenic / vectra

When using the new text-embedding-3 models the scores are a lot lower #53