UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.83k stars 2.44k forks source link

Degree of Relevancy NDCG Calculation #1349

Open ncoop57 opened 2 years ago

ncoop57 commented 2 years ago

Hi there, really enjoy the library and have been using it a ton for my research!

I had a question regarding the NDCG calculation in the InformationRetrieval evaluator: https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/evaluation/InformationRetrievalEvaluator.py#L230

...
# NDCG@k
for k_val in self.ndcg_at_k:
    predicted_relevance = [1 if top_hit['corpus_id'] in query_relevant_docs else 0 for top_hit in top_hits[0:k_val]]
    true_relevances = [1] * len(query_relevant_docs)

    ndcg_value = self.compute_dcg_at_k(predicted_relevance, k_val) / self.compute_dcg_at_k(true_relevances, k_val)
    ndcg[k_val].append(ndcg_value)
...
  1. From what I've seen, NDCG does not use binary relevancy values, but rather a degree of relevancy scores such as from 0-3. Is there any work/interest for adding this to the IR evaluator?

If there is work/interest in adding the degree of relevancy, I am happy to work/collab on a PR (will probably need some help to not break compatibility and make the API easy to use) for it. Let me know 🤓!

nreimers commented 2 years ago

Hi @ncoop57 Yes, such an extension would make sense. Would be happy on a PR :)

Maybe this function could also help: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.dcg_score.html

Sklearn also has an ndcg_score function. But I was not able to use it, as it is quite difficult to use with missing values.

E.g. these are the true relevant docs: Id1 Id5 Id7

Your model retrieves: Id2 Id1 Id5

But this issue with missing values can maybe be solved when sklearn.metrics.dcg_score is used instead. But careful tests would be needed especially with these edge cases (model does not retrieve all relevant docs)