[🐛BUG] 请问为什么用ml-1m跑模型Pop、BPR他们的Hit@10和20特别高呢

RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library

https://recbole.io/

MIT License

3.45k stars 617 forks source link

[🐛BUG] 请问为什么用ml-1m跑模型Pop、BPR他们的Hit@10和20特别高呢 #1774

Open cyxg7 opened 1 year ago

cyxg7 commented 1 year ago

用ml-1m跑模型Pop、BPR他们的Hit@10和20特别高，远超现有的一些深度学习模型，SASRec才23，而他们分别达到了 Pop 29.387 43.295 BPR 33.411 49.47 而NDCG@10和20是正常的。

cyxg7 commented 1 year ago

期待您的回复，非常感谢！

Paitesanshi commented 1 year ago

@cyxg7 感谢您对伯乐的关注这可能是因为数据的分布不平衡。某些物品非常流行且广泛受欢迎，少数物品具有大量的交互记录，而大多数物品只有少数或没有交互记录。在这种情况下，Popular模型和BPR模型可以通过简单地依据物品的流行度或随机采样进行推荐，从而获得较高的Hit@10和Hit@20指标。

cyxg7 commented 1 year ago

感谢您的回复！那请问有没有什么方法对数据集进行操作能改进这种问题呢，因为我想要将自己的模型和这些基础模型进行对比，但Pop和BPR跑出来的值太高了。同时我发现SINE模型跑出来的值又特别低，参数设置都是相同的。期待您的回复，万分感谢！

zyx1017 commented 1 year ago

情况一样

yzy945 commented 6 months ago

我感觉hit计算方法不一定准确，正在研究代码。hit类里面的那个网址打不开了，哭 class Hit(TopkMetric): r"""HR_ (also known as truncated Hit-Ratio) is a way of calculating how many 'hits' you have in an n-sized list of ranked items. If there is at least one item that falls in the ground-truth set, we call it a hit.

.. _HR: https://medium.com/@rishabhbhatia315/recommendation-system-evaluation-metrics-3f6739288870