Shen-Lab / GraphCL

[NeurIPS 2020] "Graph Contrastive Learning with Augmentations" by Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, Yang Shen
MIT License
541 stars 103 forks source link

The variance in Table 3 of the original paper should be at least 10x larger #22

Closed Somedaywilldo closed 3 years ago

Somedaywilldo commented 3 years ago

Dear Authors,

I've been reproducing your work and the original GFN work (ResGCN) for months. The idea is inspiring. But I think some of the results are not accurate. Your variance is too small comparing with standard baselines.

This is the result of GFN (ResGCN) trained using 100% training data: Screen Shot 2021-04-15 at 7 57 49 PM This is my reproduced result of the ResGCN in GFN paper, with default settings, which is the backbone you're using, quite close to the results reported by GFN. Screen Shot 2021-04-15 at 8 01 48 PM And these are the results of yours, the Table 3 in your paper. Screen Shot 2021-04-15 at 8 02 23 PM The variance is inevitable since 10-fold cross validation is used, even if the random seed is fixed. I also have some reproduced results using your original code, I'll just put some of the results on NCI1 as an example. Screen Shot 2021-04-15 at 8 13 06 PM So I think the std should be at least 10x larger if you're using 10-fold cross val correctly. Do you have any explanation on this issue? Otherwise I can't use these results. Thanks!

yyou1996 commented 3 years ago

Hi @Somedaywilldo,

Thanks for your interest in our work. Due to the higher variance in TU dataset, the mean & std I calculate is on: 5 pretrained models with their corresponding average 10-fold acc, as described in https://github.com/Shen-Lab/GraphCL/tree/master/semisupervised_TU#graphcl-with-sampled-augmentations, which is motivated from https://arxiv.org/abs/1908.01000 evaluation setting. Hope that makes sense to you, and any other question just let me know. Thanks!

Somedaywilldo commented 3 years ago

Oh, I see. That's really helpful, thank you!