Closed raffaeler closed 1 month ago
Thanks @raffaeler, can you outline how you imagine the table might look like?
Thanks for the prompt answer @KennethEnevoldsen.
Given that MTEB is a leaderboard, I believe that each length should be in a separate line as it was a different model.
Anyway, since a single model have different lengths, the Embedding Dimensions
column should contain all the vector lengths, with the one being measured in bold.
This is just an idea, but I don´t believe it is possible aggregating the results for all the model lengths for the same model in a single line, otherwise the other column values should contain multiple values which is confusing.
@KennethEnevoldsen I would also add a column telling whether the model is multimodal or not.
This is not related to Matryoshka, please let me know if you want me to open a separate issue.
We already have embedding size on the benchmark and people could add the same model twice (e.g. as MyModel (emb_size=512)
)
Please add a new issue on the multimodal. In that issue also specify why it is important.
Generally, we should probably create a more detailed model metadata. While the dashboard can't accommodate for everything it should be easy to compare models on relevant tasks. This already discussed on #314.
Closing this based on the response above.
With regards to the Matryoshka embedding models, I would love to see different lines (for the same Matryoshka model) for each vector length. It would also be valuable to tag the Matryoshka models in a separate column with all the available lengths.