embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.63k stars 212 forks source link

paper writing;: table overview of tasks #897

Open KennethEnevoldsen opened 4 weeks ago

KennethEnevoldsen commented 4 weeks ago

see https://github.com/embeddings-benchmark/mteb/discussions/595#discussioncomment-9467714

KennethEnevoldsen commented 4 weeks ago

Assigned to @Sakshamrzt

Sakshamrzt commented 3 weeks ago
Screenshot 2024-06-17 at 1 59 54 PM

@KennethEnevoldsen Having a lang column seems necessary but it could lead to issues like above. What do you think should be done here?

KennethEnevoldsen commented 3 weeks ago

Ahh yea that seems a bit frustrating. A solution would be to reduce it to ["acm", "afr", "als", ...], with a maximum on n=10 languages. This does not convey the size though. So would add a column with the number or something like

N. Languages
---
107 ("acm", "afr", "als", ...)
Sakshamrzt commented 2 weeks ago

Sounds good! Thanks a lot for your input.

KennethEnevoldsen commented 2 weeks ago

Had a look at the table. Looking good so far!

A few suggestions at which point I think we are getting there)