Open KennethEnevoldsen opened 1 month ago
By fold-down menu you mean an accordion right?
It was my initial idea yes, but I suppose multiple things could work - tabs would also be an option:
That ain't dumb! might try that one then.
multiply scores by 100 and keep one decimal, e.g. 78.1 (@orionw not sure if this also works for followIR?)
It does work for FollowIR!
Also is the v2 leaderboard up somewhere or is this a picture from development?
It's still in development. I'm using the leaderboard_2, brnach for new changes. You can run it by:
from mteb.leaderboard import demo
demo.launch()
I can host a demo version on my HF profile btw if it's something we'd be interested in having @orionw
Ah, no problem @x-tabdeveloping! For some reason I misunderstood and thought it was already up. Thanks for the offer, but no need to add extra work during your development. It’s looking great already though! 🚀
Here's a demo of the current version: https://huggingface.co/spaces/kardosdrur/mmteb_leaderboard_demo
Thanks for sharing the dev version!
The leaderboard looks really amazing! Probably already planned but
trained_on_{task_name}_{task_split}: true
or training_datasets: [(Emotion, train), (Amazon, test), ...]
or something else) and invite users to update the metadata via PR)Total Datasets: 213
Total Languages: 113
Total Scores: 88857
Total Models: 469
(could be auto-displayed per-benchmark when selecting a benchmark)
@Muennighoff I'm on it!
Hey @Muennighoff what does Total scores
mean?
Total scores
is the total number of scores i.e. how many numbers there are in the table. Maybe there's a better name for it 🤔
Might be worth moving integration with Arena to a separate issue (It might work well with #1432). I think it might warrant some more discussion. To begin with we could also add it to the description of MTEB(eng, beta). Something like:
"English also has an arena-style benchmark for evaluating embeddings. You can check this out here".
A couple of comments for readability:
Originally posted by @KennethEnevoldsen in https://github.com/embeddings-benchmark/mteb/issues/1312#issuecomment-2435013987