Config of link prediction on ogbl-collab of MaskGAE and several baselines

Newiz430 commented 1 year ago

Hi Dr. Li,

Loved your work that promotes the self-supervised masked structural modeling! I am currently reproducing your results reported in Table 3 and have several questions.

Why are some configuration settings different between your paper and train_linkpred_ogb.py? Section 6.1.2 in your paper stated that "For all datasets except arXiv, two GCN layers are applied for the encoder" but the default encoder for ogbl-collab is "sage" (train_linkpred_ogb.py, line 99). Also, Section 6.1.2 in your paper stated that MaskGAE uses BCE as link prediction loss but the implementation uses some other loss called "auc" (train_linkpred_ogb.py, line 212).
What's your settings of VGAE & ARVGA when running link prediction on ogbl-collab? I've noticed that simply replacing the GNNEncoder with torch_geometric.nn.models.VGAE or torch_geometric.nn.models.ARGVA (with an extra 2-layer GCN for variance learning) would not really work because the encoder kept outputing large features & logits that lead to very large loss values and training failure. Considering your results in Table 3, I'm very grateful for your advice about the training settings of those variational autoencoders.
How did you perform ogbl-collab link prediction on GraphMAE? I thought ogbl-collab has no node labels so GraphMAE cannot validate and test the performance of pre-training by its logistic regressor. I haven't found any specifications in your paper either. Have I missed something?

I much appreciate your timely reply so that I can cite your paper! Thank you so much!

EdisonLeeeee commented 1 year ago

Sorry for the late reply; I was on vacation.

Thank you for bringing that to our attention. We did not include the experiments on ogbl-collab initially. We will address the careless errors in our arXiv version.
I recommend adding batch normalization to the encoder and using learnable embeddings as input node features.
This learning process consists of two stages. First, we pre-train GraphMAE to obtain node representations. Then, we train the edge decoder using the learned representations for link prediction tasks.

Newiz430 commented 1 year ago

Sorry for the late reply; I was on vacation.

Thank you for bringing that to our attention. We did not include the experiments on ogbl-collab initially. We will address the careless errors in our arXiv version.

I recommend adding batch normalization to the encoder and using learnable embeddings as input node features.

This learning process consists of two stages. First, we pre-train GraphMAE to obtain node representations. Then, we train the edge decoder using the learned representations for link prediction tasks.

Not the answer I was looking for about the 3rd question (I was asking the pre-training configs and workflow of GraphMAE without node labels, like if the edges were split during node attribute prediction pre-training, how did you validate the pre-trained model without using the original logistic regression classifier etc). Still thanks a lot for replying tho

EdisonLeeeee / MaskGAE

Config of link prediction on ogbl-collab of MaskGAE and several baselines #7