Closed Connor-XY closed 4 years ago
The full MS Academic graph does not have any class that has fewer than 20 nodes. The class sizes are: 708, 462, 2050, 429, 1394, 2193, 371, 924, 775, 118, 1444, 2033, 420, 4136, 876 (as a simple np.unique(ms_academic.labels, return_counts=True)
will tell you).
However, you might end up with a class with fewer than 20 nodes in your development (visible) set. That is why I increased the number of nodes in that set from 1500 to 5000 for MS Academic (see experimental setup in the paper or the notebook reproduce_results.ipynb
). This might still happen to you despite using a larger set because you had bad luck when sampling. Changing the random seed might fix it then. Or you could just use a smaller training set. PPNP also works well for even smaller training sets.
Hi, I tried to run your code on ms_academic but I came across with this problem. You have 20 labeled nodes as the training data for each class, as you said in the paper. But there is a class in ms_academic dataset that has fewer than 20 nodes. How do you deal with this problem?