embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.61k stars 211 forks source link

Paper Writing: Introduction [only missing final revisions] #1004

Open KennethEnevoldsen opened 3 days ago

KennethEnevoldsen commented 3 days ago

I did a rewrite/correction/update of the introduction.

@mariyahendriksen and @imenelydiaker: Do you guys have time to take a look at the introduction?

mariyahendriksen commented 3 days ago

sure, will start on it tonight!

mariyahendriksen commented 2 days ago

I reviewed the introduction, let me know if you want me to do another pass over it again or focus on something else!

KennethEnevoldsen commented 1 day ago

Thanks @mariyahendriksen. I have gone over the changes and resolved all that I found appropriate (most of them). We will probably still need a few minor additions once we get the results in but I believe this is good enough to close this issue for now.

KennethEnevoldsen commented 1 day ago

Actually before I close it @mariyahendriksen will you have a final look at my corrections - I also found quite a nice section in "speeding up the benchmark":

Given that MMTEB aims, among other things, to evaluate low-resource languages, it is especially important to make the benchmark accessible to low-resource communities given a known co-occurrence of compute constraints and low resource languages. This issue is often termed the \textit{low resource double-blind} \citep{ahia-etal-2021-low-resource}.

Especially the last part is probably good to include in the introduction as well.