acbull / GPT-GNN

Code for KDD'20 "Generative Pre-Training of Graph Neural Networks"
MIT License
485 stars 87 forks source link

About the experiment in GPT-GNN #19

Open Mobzhang opened 4 years ago

Mobzhang commented 4 years ago

Hi authors Thanks for your amazing work in Pre-train GNN, It can solve larger datasizes Graph in GNN model. but I have some questions when I try to understand the part of experiment. I saw your code in github, and I noticed that it includes pre-train and fine-tune. And I have a question about experiment in your paper.

  1. How do you conduct the experiment in GraphSAGE and GAE? If I split data into 0-70% Pre-train, 70%-80% train, 80%-90%valid, 90%-100%test, I should put 0-80% data as train data into GraphSAGE and GAE ? Looking forward to your reply, thank you!
Mobzhang commented 4 years ago

And if I use no_pretrain, I saw your code that is only consider train_data, valid_data, test_data in finetune_reddit.py. Am I right?

acbull commented 4 years ago

For all of the pre-training baselines (including GAE, GraphSAGE-unsuper, and our method), the setting all follows the pretrain-finetune paradigm, which means we first pre-train the model using the self-supervised task on the pre-training dataset (in your example, 0-70%), then use the pre-trained model to finetune on the training set (70-80%), with model selection using the valid set, and get generalization performance on the test set.

Yes, for no_pretrain, we don't leverage the pre-training data.

Mobzhang commented 4 years ago

Very thanks for your reply! I'm still a little confused about GraphSAGE and GAE. Did you put all pre_train data into model or sample data just like your paper described?

acbull commented 4 years ago

Pre-training will use all the pre_train data, but we conduct mini-batch training by subgraph sampling to avoid memory issue (as the whole graph is too big for the GPU memory)