cheungdaven / DeepRec

An Open-source Toolkit for Deep Learning based Recommendation with Tensorflow.
GNU General Public License v3.0
1.14k stars 293 forks source link

A question about evaluate() in RankingMetrics #32

Closed zhboner closed 3 years ago

zhboner commented 4 years ago

According to the codes, this function evaluate() generates scores for negative items and then rank and get top k items for later evaluation.

However, in map_mrr_ndcg() and precision_recall_ndcg_at_k(), variable hits is calculated by seeing if any single negative item is in test data. If the given negative item set does not contain test item, hits will be 0. This is ridiculous and significantly affects the evaluation results, especially when taking out only 100 negative samples randomly.

The performance of a model highly depends on if it is lucky enough that its negative samples have test set.