Normalization during retrieval scores computation

violenil commented 3 months ago

Hi! Loving the Arena for quick inspection of models :)

I noticed that the scores for the retrieval are computed as dot products, as opposed to cosine similarity, even though the embeddings are not normalized. I manually added normalization during a local deployment and got significantly different results, at least for the jinaai/jina-embeddings-v2-base-en model. Do you think we can add an optional parameter to the model_meta.yml to normalize embeddings during the model.encode call? I'm happy to make a PR.

Muennighoff commented 2 months ago

Thanks! For the arena that's live we're actually using the GCP index which normalizes first & then does dot product i.e. cosine similarity: https://github.com/embeddings-benchmark/arena/blob/64a8780d596018912905523406621eed62a9a417/retrieval/gcp_index.py#L160

We should definitely adapt it for the local index though, but I think it should be done in the models folder i.e. here: https://github.com/embeddings-benchmark/mteb/tree/main/mteb/models

I think we should probably add Jina in this file: https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/sentence_transformers_models.py & then activate normalization there so it always uses normalization when loaded via mteb.get_model(... - cc @KennethEnevoldsen

KennethEnevoldsen commented 2 months ago

Yep, that is totally correct - the https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/ folder is the gold standard reference for evaluated models.

embeddings-benchmark / arena

Normalization during retrieval scores computation #34