Data split on ms_academic

The full MS Academic graph does not have any class that has fewer than 20 nodes. The class sizes are: 708, 462, 2050, 429, 1394, 2193, 371, 924, 775, 118, 1444, 2033, 420, 4136, 876 (as a simple np.unique(ms_academic.labels, return_counts=True) will tell you).

However, you might end up with a class with fewer than 20 nodes in your development (visible) set. That is why I increased the number of nodes in that set from 1500 to 5000 for MS Academic (see experimental setup in the paper or the notebook reproduce_results.ipynb). This might still happen to you despite using a larger set because you had bad luck when sampling. Changing the random seed might fix it then. Or you could just use a smaller training set. PPNP also works well for even smaller training sets.

gasteigerjo / ppnp

Data split on ms_academic #6