benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.57k stars 612 forks source link

Fix `ranking_metrics_at_k()` #591

Closed ita9naiwa closed 1 year ago

ita9naiwa commented 2 years ago

This PR resolves a few issues;

This PR adds MRR, and Precision as new metrics.

ita9naiwa commented 2 years ago

hi @benfred. Can you check and review this PR? this resolves inaccurate NDCG and MRR values of ranking_metrics_at_k function.

ita9naiwa commented 2 years ago

Changes:


tr, te = train_test_split(ratings, random_state=1541)
model = AlternatingLeastSquares(random_state=1541, factors=30, iterations=10)
model.fit(tr)
ranking_metrics_at_k(model, tr, te, K=100)

as is :

{'precision': 0.3349958296821056,
 'map': 0.12534890653797998,
 'ndcg': 0.2686550155007732,
 'auc': 0.6093577862786992}

to be:

{'precision': 0.07930221607727832,
 'recall': 0.3349958296820349,
 'map': 0.06293165699220135,
 'ndcg': 0.2686550155007732,
 'auc': 0.6093577862785867,
 'mrr': 0.5348017396151994}

I guess that definition of MAP should follow precision

thomasjungblut commented 2 years ago

@benfred any ETA on getting this in and released? I was debugging a model yesterday that had weird evaluation results and came to the same conclusion as @ita9naiwa.

malonsocortes commented 2 years ago

Hi @ita9naiwa. I was checking the code for your fix on ranking_metrics_at_k and I'm not sure about the way you define the denominator of Precision. You're using the size of the user's liked items on the test set, but shouldn't it be K, the number of recommended items? K would include True Positives + False Positives, which is what I have normally seen in the definitions I have read of precision. Correct me if I'm wrong, I'd appreciate your opinion on the issue. Thanks!

image
Blo0dR0gue commented 1 year ago

And the divisor for Recall is also wrong. It should always be divided by likes.size() and not by k if k is smaller. This would only push the score and not return the true recall value. Or am I wrong?