gasteigerjo / ppnp

PPNP & APPNP models from "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019)
https://www.daml.in.tum.de/ppnp
MIT License
318 stars 53 forks source link

Data split on ms_academic #6

Closed Connor-XY closed 4 years ago

Connor-XY commented 4 years ago

Hi, I tried to run your code on ms_academic but I came across with this problem. You have 20 labeled nodes as the training data for each class, as you said in the paper. But there is a class in ms_academic dataset that has fewer than 20 nodes. How do you deal with this problem?

gasteigerjo commented 4 years ago

The full MS Academic graph does not have any class that has fewer than 20 nodes. The class sizes are: 708, 462, 2050, 429, 1394, 2193, 371, 924, 775, 118, 1444, 2033, 420, 4136, 876 (as a simple np.unique(ms_academic.labels, return_counts=True) will tell you).

However, you might end up with a class with fewer than 20 nodes in your development (visible) set. That is why I increased the number of nodes in that set from 1500 to 5000 for MS Academic (see experimental setup in the paper or the notebook reproduce_results.ipynb). This might still happen to you despite using a larger set because you had bad luck when sampling. Changing the random seed might fix it then. Or you could just use a smaller training set. PPNP also works well for even smaller training sets.