NanoBeirEvaluator takes in empty dataset

JINO-ROHIT commented 3 days ago

I noticed that NanoBeir takes in an empty dataset and returns no error during instantiation. But when the model gets passed, the error kinda seems confusing. So -

is it fine to allow the evaluator to take in an empty dataset.
Should we change the error message saying "invalid datset or something"

Who can help? @tomaarsen

Snippet reproduction

from sentence_transformers import SentenceTransformer
from sentence_transformers.evaluation import NanoBEIREvaluator

datasets = []

#model = SentenceTransformer("sentence-transformers-testing/stsb-bert-tiny-safetensors")

evaluator = NanoBEIREvaluator( # this bit here returns no error
    dataset_names=datasets,
)

results = evaluator(model) #this raised an error

#####################################################################

Error log

{
    "name": "KeyError",
    "message": "'cosine_ndcg@10'",
    "stack": "---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[4], line 1
----> 1 evaluator(model)

sentence_transformers\\evaluation\\NanoBEIREvaluator.py:351, in NanoBEIREvaluator.__call__(self, model, output_path, epoch, steps, *args, **kwargs)
    348 if not self.primary_metric:
    349     if self.main_score_function is None:
    350         score_function = max(
--> 351             [(name, agg_results[f\"{name}_ndcg@{max(self.ndcg_at_k)}\"]) for name in self.score_function_names],
    352             key=lambda x: x[1],
    353         )[0]
    354         self.primary_metric = f\"{score_function}_ndcg@{max(self.ndcg_at_k)}\"
    355     else:

sentence-transformers\\sentence_transformers\\evaluation\\NanoBEIREvaluator.py:351, in <listcomp>(.0)
    348 if not self.primary_metric:
    349     if self.main_score_function is None:
    350         score_function = max(
--> 351             [(name, agg_results[f\"{name}_ndcg@{max(self.ndcg_at_k)}\"]) for name in self.score_function_names],
    352             key=lambda x: x[1],
    353         )[0]
    354         self.primary_metric = f\"{score_function}_ndcg@{max(self.ndcg_at_k)}\"
    355     else:

KeyError: 'cosine_ndcg@10'"
}

tomaarsen commented 3 days ago

Hello!

Well spotted, I think we should indeed add an error stating that dataset_names cannot be an empty list. It can be None, which will default to a full list, but it shouldn't be empty.

Tom Aarsen

JINO-ROHIT commented 3 days ago

cool, ill be away for a few days im going back to school for my graduation :)

ill be back and raise a PR for this.

tomaarsen commented 5 hours ago

Congratulations! 🤗

UKPLab / sentence-transformers

NanoBeirEvaluator takes in empty dataset #3103