ml-1m数据集结果不一致

enoche commented 2 years ago

Hi，thanks for sharing the code. With your source code unmodified (dropout: 0.5, neg-weight: 0.5), I have tried on ml-1m and get the following results: First col: Recall Second col: NDCG

loss,loss_no_reg,loss_reg -20357.158033288044 -20357.158033288044 0.0 TopK: [10, 20, 50] Recall@10: 0.1 NDCG@10: 0.04893161658667473 Recall@20: 0.16258278145695365 NDCG@20: 0.06466930639306495 Recall@50: 0.29817880794701984 NDCG@50: 0.09132394256584671

Which is much lower than in readme: NDCG@5, 10, 20 0.2457 0.2475 0.2656

May I have your help to reproduce your results on ml1m. Thanks.

chenchongthu commented 2 years ago

The results NDCG@5, 10, 20 0.2457 0.2475 0.2656 is on ml-lcfn datasets.

To compare with LCFN, you need to use the dataset ml-lcfn, which is the same as the data used in Graph Convolutional Network for Recommendation with Low-pass Collaborative Filters.

enoche commented 2 years ago

Noted with thanks. I will try ml-lcfn.

enoche commented 2 years ago

The results NDCG@5, 10, 20 0.2457 0.2475 0.2656 is on ml-lcfn datasets.

To compare with LCFN, you need to use the dataset ml-lcfn, which is the same as the data used in Graph Convolutional Network for Recommendation with Low-pass Collaborative Filters.

After running on ml-lcfn(with (dropout: 0.5, neg-weight: 0.5),), I got:

499 Updating: time=0.42 loss,loss_no_reg,loss_reg -14270.39457370924 -14270.39457370924 0.0 TopK: [10, 20, 50] 0.17474698281831363 0.24197656885939905 0.26642073600099603 0.2590238286022946 0.42833159703000406 0.30848090465690464

NDCG@10: 0.24197656885939905 < 0.2475 NDCG@20: 0.2590238286022946 < 0.2656

Not the same as reported in README, but is acceptable.

chenchongthu commented 2 years ago

Have you ever read the readme carefully? What is your setting of embedding size? For a fair comparison, we also set the embedding size as 128, which is utilized in the LCFN work.

enoche commented 2 years ago

Have you ever read the readme carefully? What is your setting of embedding size? For a fair comparison, we also set the embedding size as 128, which is utilized in the LCFN work.

Noted! Here is the results:

499 Updating: time=0.73 loss,loss_no_reg,loss_reg -15902.035453464674 -15902.035453464674 0.0 TopK: [10, 20, 50] 0.18050760514556305 0.24640283029068843 0.27622081694706263 0.26572967980513473 0.44025214134898943 0.31696308408475754

NDCG@10: 0.24640283029068843 < 0.2475 NDCG@20: 0.26572967980513473 > 0.2656

Much better!

chenchongthu commented 2 years ago

所以我比较好奇跟selfCF的结果差异如何？可否给我一份你的分隔好训练集和测试集的数据呢？

enoche commented 2 years ago

所以我比较好奇跟selfCF的结果差异如何？可否给我一份你的分隔好训练集和测试集的数据呢？

附件中是 amazon-games的数据5-core处理完的。 x_label列是train/valid/test (0/1/2) 的标记，这个数据分隔是按全局时间序来的（SelfCF一样的）。您那边可以测试一下看看效果。

games_processed.csv

chenchongthu commented 2 years ago

所以我比较好奇跟selfCF的结果差异如何？可否给我一份你的分隔好训练集和测试集的数据呢？

附件中是 amazon-games的数据5-core处理完的。 x_label列是train/valid/test (0/1/2) 的标记，这个数据分隔是按全局时间序来的（SelfCF一样的）。您那边可以测试一下看看效果。

games_processed.csv

谢谢，我大概跑了一下， parser.add_argument('--dropout', type=float, default=0.5, help='dropout keep_prob') parser.add_argument('--negative_weight', type=float, default=0.05,

在第100轮时结果如下： R@20=0.0764 R@50=0.1323 N@20=0.0367 N@50=0.0511

看起来比SelfCF好很多？🤔 R@20=0.0509 R@50=0.0913 N@20=0.0250 N@50=0.0350

enoche commented 2 years ago

嗯，这个结果确实不错。能分享一下代码不？谢谢啦！您那边还是tensorflow吗？~

chenchongthu commented 2 years ago

嗯，这个结果确实不错。能分享一下代码不？谢谢啦！您那边还是tensorflow吗？~

代码就还是github上的代码

enoche commented 2 years ago

所以我比较好奇跟selfCF的结果差异如何？可否给我一份你的分隔好训练集和测试集的数据呢？

附件中是 amazon-games的数据5-core处理完的。 x_label列是train/valid/test (0/1/2) 的标记，这个数据分隔是按全局时间序来的（SelfCF一样的）。您那边可以测试一下看看效果。 games_processed.csv

谢谢，我大概跑了一下， parser.add_argument('--dropout', type=float, default=0.5, help='dropout keep_prob') parser.add_argument('--negative_weight', type=float, default=0.05,

在第100轮时结果如下： R@20=0.0764 R@50=0.1323 N@20=0.0367 N@50=0.0511

看起来比SelfCF好很多？🤔 R@20=0.0509 R@50=0.0913 N@20=0.0250 N@50=0.0350

嗯，这个结果确实不错。能分享一下代码不？谢谢啦！您那边还是tensorflow吗？~

代码就还是github上的代码

嗯，了解。您之前的数据是用last one，现在是用global-time line分隔，这个对程序上没有影响吗？

chenchongthu commented 2 years ago

所以我比较好奇跟selfCF的结果差异如何？可否给我一份你的分隔好训练集和测试集的数据呢？

附件中是 amazon-games的数据5-core处理完的。 x_label列是train/valid/test (0/1/2) 的标记，这个数据分隔是按全局时间序来的（SelfCF一样的）。您那边可以测试一下看看效果。 games_processed.csv

谢谢，我大概跑了一下， parser.add_argument('--dropout', type=float, default=0.5, help='dropout keep_prob') parser.add_argument('--negative_weight', type=float, default=0.05, 在第100轮时结果如下： R@20=0.0764 R@50=0.1323 N@20=0.0367 N@50=0.0511 看起来比SelfCF好很多？🤔 R@20=0.0509 R@50=0.0913 N@20=0.0250 N@50=0.0350

嗯，这个结果确实不错。能分享一下代码不？谢谢啦！您那边还是tensorflow吗？~

代码就还是github上的代码

嗯，了解。您之前的数据是用last one，现在是用global-time line分隔，这个对程序上没有影响吗？

没影响，只要把训练集测试集放在目录下，就可以直接跑了，你的数据我也已经改成我这边可以直接用的格式上传了，https://github.com/chenchongthu/ENMF/tree/master/data/game

enoche commented 2 years ago

所以我比较好奇跟selfCF的结果差异如何？可否给我一份你的分隔好训练集和测试集的数据呢？

附件中是 amazon-games的数据5-core处理完的。 x_label列是train/valid/test (0/1/2) 的标记，这个数据分隔是按全局时间序来的（SelfCF一样的）。您那边可以测试一下看看效果。 games_processed.csv

谢谢，我大概跑了一下， parser.add_argument('--dropout', type=float, default=0.5, help='dropout keep_prob') parser.add_argument('--negative_weight', type=float, default=0.05, 在第100轮时结果如下： R@20=0.0764 R@50=0.1323 N@20=0.0367 N@50=0.0511 看起来比SelfCF好很多？🤔 R@20=0.0509 R@50=0.0913 N@20=0.0250 N@50=0.0350

嗯，这个结果确实不错。能分享一下代码不？谢谢啦！您那边还是tensorflow吗？~

代码就还是github上的代码

嗯，了解。您之前的数据是用last one，现在是用global-time line分隔，这个对程序上没有影响吗？

没影响，只要把训练集测试集放在目录下，就可以直接跑了，你的数据我也已经改成我这边可以直接用的格式上传了，https://github.com/chenchongthu/ENMF/tree/master/data/game

好的，非常感谢！

chenchongthu commented 2 years ago

所以我比较好奇跟selfCF的结果差异如何？可否给我一份你的分隔好训练集和测试集的数据呢？

附件中是 amazon-games的数据5-core处理完的。 x_label列是train/valid/test (0/1/2) 的标记，这个数据分隔是按全局时间序来的（SelfCF一样的）。您那边可以测试一下看看效果。 games_processed.csv

谢谢，我大概跑了一下， parser.add_argument('--dropout', type=float, default=0.5, help='dropout keep_prob') parser.add_argument('--negative_weight', type=float, default=0.05, 在第100轮时结果如下： R@20=0.0764 R@50=0.1323 N@20=0.0367 N@50=0.0511 看起来比SelfCF好很多？🤔 R@20=0.0509 R@50=0.0913 N@20=0.0250 N@50=0.0350

嗯，这个结果确实不错。能分享一下代码不？谢谢啦！您那边还是tensorflow吗？~

代码就还是github上的代码

嗯，了解。您之前的数据是用last one，现在是用global-time line分隔，这个对程序上没有影响吗？

没影响，只要把训练集测试集放在目录下，就可以直接跑了，你的数据我也已经改成我这边可以直接用的格式上传了，https://github.com/chenchongthu/ENMF/tree/master/data/game

好的，非常感谢！

不客气~随时交流

enoche commented 2 years ago

所以我比较好奇跟selfCF的结果差异如何？可否给我一份你的分隔好训练集和测试集的数据呢？

附件中是 amazon-games的数据5-core处理完的。 x_label列是train/valid/test (0/1/2) 的标记，这个数据分隔是按全局时间序来的（SelfCF一样的）。您那边可以测试一下看看效果。 games_processed.csv

谢谢，我大概跑了一下， parser.add_argument('--dropout', type=float, default=0.5, help='dropout keep_prob') parser.add_argument('--negative_weight', type=float, default=0.05, 在第100轮时结果如下： R@20=0.0764 R@50=0.1323 N@20=0.0367 N@50=0.0511 看起来比SelfCF好很多？🤔 R@20=0.0509 R@50=0.0913 N@20=0.0250 N@50=0.0350

嗯，这个结果确实不错。能分享一下代码不？谢谢啦！您那边还是tensorflow吗？~

代码就还是github上的代码

嗯，了解。您之前的数据是用last one，现在是用global-time line分隔，这个对程序上没有影响吗？

没影响，只要把训练集测试集放在目录下，就可以直接跑了，你的数据我也已经改成我这边可以直接用的格式上传了，https://github.com/chenchongthu/ENMF/tree/master/data/game

好的，非常感谢！

不客气~随时交流

非常感谢耐心解答。我刚检查了一下，发现我数据给你错了，上面是一个小样本的测试，全部的在下面：对了，您有微信不（我的303432874)？能加一下方便联系不？谢谢啦！ games5_processed.csv

chenchongthu / ENMF

ml-1m数据集结果不一致 #9