Gap between the results of pytorch implementation and tensorflow implementation

gasteigerjo / ppnp

PPNP & APPNP models from "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019)

https://www.daml.in.tum.de/ppnp

MIT License

317 stars 54 forks source link

Gap between the results of pytorch implementation and tensorflow implementation #5

Closed yaokl-nju closed 4 years ago

yaokl-nju commented 4 years ago

Hi,

Thanks for release the codes and it's a very interesting work! But I found that there is a gap between the results of pytorch implementation (e.g. 82.5% accuracy in cora_ml) and tensorflow (e.g. 85.2% accuracy in cora_ml) implementation. Why would it result in such a gap between pytorch and tensorflow?

I would be very appreciate if you could answer my question!

gasteigerjo commented 4 years ago

This difference is not due to PyTorch/Tensorflow, but due to the inherent stochasticity of the results. Whenever you run a GNN you will observe large variations in the resulting accuracy, based on weight initialization and data split (train/validation/test). The simple examples in this repository are only meant to demonstrate the model and will not always produce the same result. To obtain consistent, robust results you need to run the model many times, use varying splits, and pay special attention to hyperparameters and early stopping, as explained in the paper and shown in reproduce_results.ipynb.

yaokl-nju commented 4 years ago

Thanks for the reply!

I got what you say, and the results that I mention above are achieved (with your pytorch codes) under the same settings as in reproduce_results.ipynb, i.e. 100 runs with 20 different random seeds (in reproduce_results.ipynb) for splits and 5 random initialization for each seed. As I have mentioned above, the gap of results on different platforms are large.

I wonder whether you can achieve comparable results to the results in the paper with your pytorch codes. If you can release a complete pytorch codes for reproducing the results in the paper, that will help a lot.

Thanks for the patient reply again. I would be very appreciate if you could answer my question above!

gasteigerjo commented 4 years ago

That is interesting. I am sure you have overlooked some detail (dataset split seeds? Weight decay?), because the method has successfully been reproduced multiple times in other places, and the results in the two example notebooks are on around the same level. Unfortunately, I can't help you with finding that bug and I currently have no time to reproduce everything myself using PyTorch. I've done all of the original research with Tensorflow and only provided a PyTorch implementation for better accessibility, so unfortunately I have nothing to share along that line.

But please do keep me posted when you find an explanation!

yaokl-nju commented 4 years ago

Thank you for the patient reply!

May be that I didn't explain well before. The seeds are used for splitting the train, stop and validation, right? But I absolutely use the codes you provide, get the results above.

Thank you again!

gasteigerjo commented 4 years ago

Okay, I've now done exactly what you described and changed the reproduce_results.ipynb notebook slighlty to run with the PyTorch version. That actually perfectly reproduces the results from TensorFlow. So I do not know which bug leads to your results, but now you can have a look at reproduce_results_pytorch.ipynb for inspiration. :)