embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.83k stars 246 forks source link

Fix GritLM instructions #976

Closed Muennighoff closed 3 months ago

Muennighoff commented 3 months ago

if we care about exactly reproducing e5 & grit, I think we have another bug which is that they do not use an instruction for documents, but only queries for Retrieval. One solution could be to distinguish corpus & query prompts like here: https://github.com/ContextualAI/gritlm/blob/da37ccbace1aa4f4bf26273e4e9a7cea705ae951/evaluation/eval_mteb.py#L66 and then instead try to get the respective prompt in encode_corpus & encode_queries --- What do you think?

Fixes https://github.com/embeddings-benchmark/mteb/issues/973