Problem with reproducing the results of COLLAB and REDDIT datasets

MarcTLaw commented 3 years ago

Hi,

I have unsuccessfully tried to reproduce the results for the 10-fold cross validation settings in Appendix Section E. I managed to do it for the D&D, Enzymes and Proteins, datasets but not for the Reddit and Collab datasets although I have followed the data processing steps.

Collab: For the Lorentz manifold, the reported accuracy is 88.96%. I have run the code using the same settings, and also tried different optimizers, learning rates, increased the number of epochs and patience parameter. The average score is usually about 81% (79 to 83%).

Reddit: Same as above. The reported accuracy is 53% but the accuracy for every split is always less than 50%.

Do you have an idea how to fix that?

Thanks in advance.

leuchine commented 3 years ago

Hi Marc,

Thanks for raising this issue. As these datasets are small and the results usually have large variances, they are a bit sensitive to seeds and hyperparameter settings. Could you please provide the parameters you used on these two datasets? Also, have you tried to change the activation function? Thanks!

Best Regards, Qi

MarcTLaw commented 3 years ago

Hi Qi,

I have used the exact same code and the parameters used in utils/CollabHyperbolicParams.py and utils/RedditUltraHyperbolicParams.py. And I have tried all the combinations of sgd/adam/amsgrad optimizers with learning rates 0.1, 0.01, 0.001, 0.0001.

I haven't tried to play with the activation function so I used only relu for Collab and rrelu for Reddit. I will try to use other activation functions.

What other hyperparameters should I play with?

Thanks in advance.

leuchine commented 3 years ago

Thanks Marc. The setting looks good to me.

If you are interested, you can also try tuning the batch size, dropout and num centroids. Thanks!

facebookresearch / hgnn

Problem with reproducing the results of COLLAB and REDDIT datasets #19