embeddings-benchmark / arena

Code for the MTEB Arena
https://hf.co/spaces/mteb/arena
14 stars 7 forks source link

Make the arena a competitive arXiv/Wikipedia/StackExchange search engine #44

Open Muennighoff opened 1 month ago

Muennighoff commented 1 month ago

The problem is that we're not getting enough votes. People use LMSys ChatbotArena, because it is pretty useful in itself, e.g. to play with models you cannot access & help you solve problems. For our arena this is more difficult as we have fixed corpora and people cannot easily add their own large corpus so it is more constrained.

@shaoyijia suggested incentivizing more people to vote by making it actually useful, e.g. maybe it could be a research/learning partner. For example, if we sell it as a better arxiv search than the native support of arxiv (most people think arxiv search support is bad), people may be curious to try and have more incentive to vote if we also show the top-k recommendation of the winner models you choose to help you know more. Currently, people may not have a lot of incentive to vote/play if they just see the below. (paraphrasing Yijia's comments here)

image

Some concrete things we can do:

KennethEnevoldsen commented 1 month ago

I think this is great! These changes could also work for wikipedia as well.

We could have it highlight the answer in the retrieved document either using a specific model (this does introduce a bias), alternatively we could also do it by embedding segments on the answer and see which segments are the best match. This is a non-trivial change though.