embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.74k stars 231 forks source link

[Question] why does RerankingEvaluator implementation use embeddings + cos_sim instead of using similarity score from model? #229

Open chrjxj opened 6 months ago

chrjxj commented 6 months ago

as far as i understand, reranker models, typcially take query + doc as input, and directly give a score as output. so called "cross-encoders".

cross-encoders

however, when i read RerankingEvaluator implementation link, it get embeddings from query and doc, and then calc cos similarity.

I tried to modified the code and thus use similarity score directly from model output, as expected, the evaluation results are different (higher) with the default RerankingEvaluator implementation.

My question, why does RerankingEvaluator implementation use embeddings + cos_sim instead of using similarity score from model?

Muennighoff commented 6 months ago

In my mind, Reranking is just about reordering texts such that the order is more accurate - it doesn't constrain how you reorder them. Thus, you can also use Bi-Encoders / embedding models as Rerankers; Cross-Encoders is just more common as they tend to give better performance.

It is cool that you have implemented Cross-Encoder support! If you want, you can open a PR and we can merge it :) I can also add it to the leaderboard, though will probably need to make a distinction btw Bi & Cross encoders (e.g. for the Chinese Reranking leaderboard there is already a Cross-Encoder in there I think)