Closed ita9naiwa closed 1 year ago
hi @benfred.
Can you check and review this PR?
this resolves inaccurate NDCG and MRR values of ranking_metrics_at_k
function.
Changes:
tr, te = train_test_split(ratings, random_state=1541)
model = AlternatingLeastSquares(random_state=1541, factors=30, iterations=10)
model.fit(tr)
ranking_metrics_at_k(model, tr, te, K=100)
as is :
{'precision': 0.3349958296821056,
'map': 0.12534890653797998,
'ndcg': 0.2686550155007732,
'auc': 0.6093577862786992}
to be:
{'precision': 0.07930221607727832,
'recall': 0.3349958296820349,
'map': 0.06293165699220135,
'ndcg': 0.2686550155007732,
'auc': 0.6093577862785867,
'mrr': 0.5348017396151994}
I guess that definition of MAP should follow precision
@benfred any ETA on getting this in and released? I was debugging a model yesterday that had weird evaluation results and came to the same conclusion as @ita9naiwa.
Hi @ita9naiwa. I was checking the code for your fix on ranking_metrics_at_k and I'm not sure about the way you define the denominator of Precision. You're using the size of the user's liked items on the test set, but shouldn't it be K, the number of recommended items? K would include True Positives + False Positives, which is what I have normally seen in the definitions I have read of precision. Correct me if I'm wrong, I'd appreciate your opinion on the issue. Thanks!
And the divisor for Recall is also wrong. It should always be divided by likes.size() and not by k if k is smaller. This would only push the score and not return the true recall value. Or am I wrong?
This PR resolves a few issues;
412 "precision" on ranking_metrics_at_k is actually "recall"
I guess it's fine to update precision and recall since this library took major braking update (0.5.0)
545: ranking_metric_at_k raises
ValueError
if K > num_itemsThis PR adds MRR, and Precision as new metrics.