embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.83k stars 246 forks source link

Fix GritLM Retrieval instructions #981

Closed Muennighoff closed 3 months ago

Muennighoff commented 3 months ago

Issue mentioned in https://github.com/embeddings-benchmark/mteb/pull/976

KennethEnevoldsen commented 3 months ago

Looks good @Muennighoff - I have added a fix for the e5 models as well

Muennighoff commented 3 months ago

I think for e5 it was already fine cuz it didn't use the instruction anyways, but also fine to change it like you did 👍

        if encode_type == "query":
            sentences = [
                f"Instruction: {instruction}\nQuery: {sentence}"
                for sentence in sentences
            ]
KennethEnevoldsen commented 3 months ago

sorry I was unclear - changed it for the e5 instruct models only

Muennighoff commented 3 months ago

Oh sorry I was unclear - i also meant that for e5 instruct I think it was already fine cuz of the code above, but anyways it's cleaner to explicitly not have an instruction like you changed so all good as is I think! 😁

KennethEnevoldsen commented 3 months ago

Oh double misunderstanding there. Well no worries clarity is probably a good thing in this case as you say