Closed chencsgit closed 3 years ago
The citeseer and pubmed dataset are right, only the cora dataset have this problem.
The dataset split we used comes from GEOM-GCN, and some nodes are not used for training, verification, or testing, which is a mistake. Thank you for pointing this out. I didn't check the data split before, but I ensured that all baseline used the same data split.
Thank you for your reply.
Hi authors, I have read your paper, which is quite interesting. Thank you for your great work.
But I have a question about the split of Cora Dataset.
I count the node number of train_mask, val_mask, test_mask in https://github.com/chennnM/GCNII/blob/ca91f5686c4cd09cc1c6f98431a5d5b7e36acc92/process.py#L157 which are 1192, 796, 497. The sum of nodes [train_mask, val_mask, test_mask] is not 2,708, which is different from nodes shown in your paper.
You can reproduce this phenomenon by the code: print('train_mask is %s' %train_mask.numpy().sum()) print('val_mask is %s' %val_mask.numpy().sum()) print('test_mask is %s' % test_mask.numpy().sum())
I don't understand why this happen. Could you please point out? Hope for your response. Thanks!