Split Matryoshka model results where applicable

embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark

https://arxiv.org/abs/2210.07316

Apache License 2.0

1.86k stars 250 forks source link

Split Matryoshka model results where applicable #310

Closed raffaeler closed 1 month ago

raffaeler commented 6 months ago

With regards to the Matryoshka embedding models, I would love to see different lines (for the same Matryoshka model) for each vector length. It would also be valuable to tag the Matryoshka models in a separate column with all the available lengths.

KennethEnevoldsen commented 6 months ago

Thanks @raffaeler, can you outline how you imagine the table might look like?

raffaeler commented 6 months ago

Thanks for the prompt answer @KennethEnevoldsen.

Given that MTEB is a leaderboard, I believe that each length should be in a separate line as it was a different model. Anyway, since a single model have different lengths, the Embedding Dimensions column should contain all the vector lengths, with the one being measured in bold.

This is just an idea, but I don´t believe it is possible aggregating the results for all the model lengths for the same model in a single line, otherwise the other column values should contain multiple values which is confusing.

raffaeler commented 6 months ago

@KennethEnevoldsen I would also add a column telling whether the model is multimodal or not.

This is not related to Matryoshka, please let me know if you want me to open a separate issue.

KennethEnevoldsen commented 6 months ago

We already have embedding size on the benchmark and people could add the same model twice (e.g. as MyModel (emb_size=512))

Please add a new issue on the multimodal. In that issue also specify why it is important.

Generally, we should probably create a more detailed model metadata. While the dashboard can't accommodate for everything it should be easy to compare models on relevant tasks. This already discussed on #314.

isaac-chung commented 1 month ago

Closing this based on the response above.