AIML-K / GNN_Survey

updating papers related to GNN
0 stars 0 forks source link

Graph neural network-inspired kernels for gaussian processes in semi-supervised learning #7

Open 2nazero opened 1 week ago

2nazero commented 1 week ago

Graph neural network-inspired kernels for gaussian processes in semi-supervised learning

@article{niu2023graph,
  title={Graph neural network-inspired kernels for gaussian processes in semi-supervised learning},
  author={Niu, Zehao and Anitescu, Mihai and Chen, Jie},
  journal={arXiv preprint arXiv:2302.05828},
  year={2023}
}
2nazero commented 2 days ago

Main Idea - The combination of GNNs and GPs

Found a relevant paper that explains wide neural networks behave as GP: Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models

GCN as GP

All proofs are given in the appendix of this paper!

Equation (1): GCN Layer Formula

$$ X^{(l)} = \phi\left( A X^{(l-1)} W^{(l)} + b^{(l)} \right), $$

Equation (2): Element-wise GCN Transformation

$$ x_i^{(l)}(x) = \phi\left(z_i^{(l)}(x)\right), \quad z_i^{(l)}(x) = bi^{(l)} + \sum{v \in \mathcal{V}} A_{xv} y_i^{(l)}(v), $$

Equation (3): Covariance Calculation in GP Interpretation

$$ C^{(l)} = \mathbb{E}_{z_i^{(l)} \sim \mathcal{N}(0, K^{(l)})} \left[\phi(z_i^{(l)}) \phi(z_i^{(l)})^T \right], \quad l = 1, \ldots, L, $$

Equation (4): Covariance Update for the Next Layer

$$ K^{(l+1)} = \sigmab^2 1{N \times N} + \sigma_w^2 A C^{(l)} A^T, \quad l = 0, \ldots, L - 1. $$

2nazero commented 2 days ago

Scalable Computation through Low-Rank Approximation

Key Approach

Goal

2nazero commented 2 days ago

Composing Graph Neural Network-Inspired Kernels

2nazero commented 2 days ago

Experiments

Why was this experiment conducted?

The experiments aim to demonstrate the performance of GP kernels derived by taking limits on the layer width of GCNs and other GNN architectures. The primary goal is to show that these GPs are comparable in prediction accuracy while being significantly faster to compute.

Datasets

The experiments were conducted on multiple benchmark datasets, covering both classification and regression tasks:

Evaluation Setup

  1. Prediction Performance (GCN-Based Comparison):

    • The GP kernels used include:
      • GCNGP: Derived from the limiting case of GCN.
      • RBF: A standard squared-exponential kernel.
      • GGP: A Gaussian Process kernel from related literature.
    • Each of these kernels has a low-rank version (indicated by -X), which uses a Nyström approximation for scalability.
  2. Performance Metrics:

    • Classification: Micro-F1 score.
    • Regression: $R^2$ (coefficient of determination).
  3. Experiment 1: Comparing GCNs and GPs:

    • The results show that GCNGP performs comparably or slightly better than GCN and other GP kernels. The low-rank version (GCNGP-X) achieves competitive results while being more computationally efficient.
  4. Experiment 2: Comparing with Other GNNs:

    • The study extends the comparison to popular GNN architectures such as GCNII, GIN, and GraphSAGE.
    • The results show that the GP versions of these architectures perform similarly to the GNNs, but with efficiency advantages in specific cases like PubMed and Reddit. 스크린샷 2024-10-23 오후 6 10 24

Running Time and Scalability

  1. Running Time Comparison:

    • The comparison shows that GCNGP-X is generally faster than GCN, with some datasets showing a speedup by one to two orders of magnitude. This is illustrated in Figure 1.
  2. Scalability Analysis:

    • Figure 2 demonstrates that the running time scales approximately linearly with respect to the graph size ($M + N$) for GCNGP-X and GCN. However, GCNGP shows cubic scaling, making the low-rank approximation more practical for large graphs.
  3. Analysis on the Depth:

    • The study examines the impact of increasing the number of layers. The results (Figure 3) indicate that both GCN and GCNII suffer from oversmoothing at larger depths. However, their GP counterparts (e.g., GCNGP) remain stable even for a depth as large as 12.
  4. Analysis on the Landmark Set:

    • The number of landmark nodes ($N_a$) affects the trade-off between approximation quality and running time. Figure 4 shows that using only $1/800$ of the training set as landmarks achieves accuracy comparable to GCN, while the computational cost remains much lower.

Results

Key Takeaway

The experiments show that GP kernels derived from GCNs can achieve comparable prediction performance to GNNs while being computationally more efficient. This demonstrates the practicality of the proposed GP approach for large-scale and deep graph-based tasks.

2nazero commented 2 days ago

I have reviewed this paper but found it challenging to fully comprehend the mathematical proofs and rigorous details presented in the equations. If anyone could provide insights or comments, I would greatly appreciate it :)

d-h-lee commented 2 days ago

I have reviewed this paper but found it challenging to fully comprehend the mathematical proofs and rigorous details presented in the equations. If anyone could provide insights or comments, I would greatly appreciate it :)

to fully comprehend the mathematical proofs --> as you clearly mentioned in https://github.com/AIML-K/GNN_Survey/issues/7#issuecomment-2430977827 , understanding NN and GP equivalence will help you understanding this. In particular, paper on infinitely-wide NN equivalent to GP will be useful.

Don't go to the first source (Radford M Neal. Priors for infinite networks. In Bayesian Learning for Neural Networks, pages 29–53. Springer, 1996. -- too statistical). I recommend https://arxiv.org/abs/1711.00165 or https://openreview.net/pdf?id=rkl4aESeUH .

Meanwhile, your decision to attend GP workshop will pay off.