embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.98k stars 276 forks source link

Improve leaderboard 2.0 readability #1317

Open KennethEnevoldsen opened 1 month ago

KennethEnevoldsen commented 1 month ago

A couple of comments for readability:

Originally posted by @KennethEnevoldsen in https://github.com/embeddings-benchmark/mteb/issues/1312#issuecomment-2435013987

x-tabdeveloping commented 1 month ago

By fold-down menu you mean an accordion right?

KennethEnevoldsen commented 1 month ago

It was my initial idea yes, but I suppose multiple things could work - tabs would also be an option:

Screenshot 2024-10-24 at 13 51 54
x-tabdeveloping commented 1 month ago

That ain't dumb! might try that one then.

orionw commented 1 month ago

multiply scores by 100 and keep one decimal, e.g. 78.1 (@orionw not sure if this also works for followIR?)

It does work for FollowIR!

Also is the v2 leaderboard up somewhere or is this a picture from development?

x-tabdeveloping commented 1 month ago

It's still in development. I'm using the leaderboard_2, brnach for new changes. You can run it by:

from mteb.leaderboard import demo

demo.launch()
x-tabdeveloping commented 1 month ago

I can host a demo version on my HF profile btw if it's something we'd be interested in having @orionw

orionw commented 1 month ago

Ah, no problem @x-tabdeveloping! For some reason I misunderstood and thought it was already up. Thanks for the offer, but no need to add extra work during your development. It’s looking great already though! 🚀

x-tabdeveloping commented 1 month ago

Here's a demo of the current version: https://huggingface.co/spaces/kardosdrur/mmteb_leaderboard_demo

tomaarsen commented 4 weeks ago

Thanks for sharing the dev version!

Muennighoff commented 2 weeks ago

The leaderboard looks really amazing! Probably already planned but

x-tabdeveloping commented 2 weeks ago

@Muennighoff I'm on it!

x-tabdeveloping commented 2 weeks ago

Hey @Muennighoff what does Total scores mean?

Muennighoff commented 2 weeks ago

Total scores is the total number of scores i.e. how many numbers there are in the table. Maybe there's a better name for it 🤔

KennethEnevoldsen commented 2 weeks ago

Might be worth moving integration with Arena to a separate issue (It might work well with #1432). I think it might warrant some more discussion. To begin with we could also add it to the description of MTEB(eng, beta). Something like:

"English also has an arena-style benchmark for evaluating embeddings. You can check this out here".

x-tabdeveloping commented 2 weeks ago

I'm a bit stopped in my tracks because of glaring issues with Gradio's dataframes (1, 2). I have implemented the plot though, and will add overview info to the benchmarks' descriptions.