Thank you for your fascinating work! I have several question this paper:
Regarding the training of the mask generator, in Formula 9, does the probability represent the binary classification of whether a specific node will be masked or not? If so, is the purpose of using Gumbel-Softmax just to generate discrete mask vectors (0 or 1) while maintaining differentiability, otherwise the mask vector can be decided directly by the probability.
Besides, I am curious if you try the link prediction task?
Did you try use other GNN model for mask generator? Since GraphMAE highly depends on GAT and it performs worse if other GNN models are used.
Did you try sampling (neighbor sampling or subgraph sampling)?
Thanks for your interest in our work! Regarding your questions:
The "prob" in Equation 9 is the probability vector for each node being masked. A binary vector "m" (as shown in Equation 10) is then sampled from this vector using the Gumbel-Softmax, which indicates whether each node is masked. The Gumbel-Softmax is used to make the sampling operation differentiable.
We followed the experimental setup of GraphMAE and did not attempt the link prediction task.
We experimented with different GNN architectures as the mask generator. On most datasets, the GNN architecture of the mask generator we used is consistent with that of the encoder & decoder. You can refer to configs.yml for details.
Thank you for your fascinating work! I have several question this paper:
Regarding the training of the mask generator, in Formula 9, does the probability represent the binary classification of whether a specific node will be masked or not? If so, is the purpose of using Gumbel-Softmax just to generate discrete mask vectors (0 or 1) while maintaining differentiability, otherwise the mask vector can be decided directly by the probability.
Besides, I am curious if you try the link prediction task?
Did you try use other GNN model for mask generator? Since GraphMAE highly depends on GAT and it performs worse if other GNN models are used.
Did you try sampling (neighbor sampling or subgraph sampling)?
Thanks.