embeddings-benchmark / leaderboard

Code for the MTEB leaderboard
https://hf.co/spaces/mteb/leaderboard
9 stars 6 forks source link

Fix models with empty results #8

Closed orionw closed 1 month ago

orionw commented 1 month ago

Both NV-Retriever and E5-R-Mistral-7B show up with empty results.

orionw commented 1 month ago

I was wrong, NV-Retriever is now on the leaderboard, it just only has Retrieval results.

image

E5-r-mistral is not showing up fully because of CQADupstackRetrieval

image

orionw commented 1 month ago

cc'ing @BeastyZ as an FYI, although it appears that the metadata is right. Is that correct @KennethEnevoldsen?

bschifferer commented 1 month ago

@orionw can you explain how the Model Size and Memory is populated? I looked at README.md of other models, but I did not find any metadata related to it

orionw commented 1 month ago

@bschifferer that's in our model_meta.yaml folder, we don't currently have an automatic way of doing that. If you want to add yours that would be awesome! Or just let me know here. Thanks for the reminder for us to add that to the documentation.

rnyak commented 1 month ago

@orionw hello! Thanks for guiding us. I added this PR, would you mind please reviewing. Please let us know if we need to do anything else.

rnyak commented 1 month ago

@bschifferer for viz.

KennethEnevoldsen commented 1 month ago

cc @KennethEnevoldsen @orionweller This is regarding the PawsXPairClassification (fr) key not being found.

Will take a look at this issue tomorrow.

Re. CQADupstackRetrieval seems like it is an aggregate score across a few tasks. We can add that to the create_meta CLI or directly to the leaderboard. Seems like previously it has been in the frontmatter on the .md so create_meta is probably the best option.

orionw commented 1 month ago

Thanks @KennethEnevoldsen. Yes, @BeastyZ we have a script for this in mteb

BeastyZ commented 1 month ago

Thanks @orionw. I got the score of CQADupstackRetrieval with your script. I have updated the Model Card of e5-R-mistral-7b. Could you refresh the space now or let me know the next scheduled refresh time?

orionw commented 1 month ago

Thanks @BeastyZ! It refreshes once a day or on commit to main. I just made a small PR for another change so you should see updates soon when it finishes refreshing.

orionw commented 1 month ago

Looks like both models are on the leaderboard. Closing the issue

BeastyZ commented 1 month ago

@rnyak @bschifferer @KennethEnevoldsen @orionw @guenthermi I am truly grateful for your help!

tomaarsen commented 1 month ago

@orionw can you explain how the Model Size and Memory is populated? I looked at README.md of other models, but I did not find any metadata related to it

@bschifferer that's in our model_meta.yaml folder, we don't currently have an automatic way of doing that. If you want to add yours that would be awesome! Or just let me know here. Thanks for the reminder for us to add that to the documentation.

Model Size and Memory are automatically computed via https://github.com/embeddings-benchmark/leaderboard/blob/main/utils/model_size.py#L12. This method looks at the safetensors/pytorch_model.bin file in your repository, so it will be blank until you've released the model. An alternative is indeed via the model_meta.yaml file, but this was primarily for proprietary models that aren't on Hugging Face/don't have metadata on HF. I see that NV-Retriever-v1 has also been added via #9, which will work fine.

orionw commented 1 month ago

Thanks @tomaarsen for the clarification. Glad to see I was wrong :)

KennethEnevoldsen commented 1 month ago

For future reference: I have added a fix to include CQADupstackRetrieval in future version using mteb create_meta

@orionw has also fixed the PawsXPairClassification issue. This should hopefully make leaderboard addition easier in the future