Open 2nazero opened 1 week ago
Found a relevant paper that explains wide neural networks behave as GP: Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models
All proofs are given in the appendix of this paper!
$$ X^{(l)} = \phi\left( A X^{(l-1)} W^{(l)} + b^{(l)} \right), $$
$$ x_i^{(l)}(x) = \phi\left(z_i^{(l)}(x)\right), \quad z_i^{(l)}(x) = bi^{(l)} + \sum{v \in \mathcal{V}} A_{xv} y_i^{(l)}(v), $$
$$ C^{(l)} = \mathbb{E}_{z_i^{(l)} \sim \mathcal{N}(0, K^{(l)})} \left[\phi(z_i^{(l)}) \phi(z_i^{(l)})^T \right], \quad l = 1, \ldots, L, $$
$$ K^{(l+1)} = \sigmab^2 1{N \times N} + \sigma_w^2 A C^{(l)} A^T, \quad l = 0, \ldots, L - 1. $$
Purpose: This section explains how the covariance matrix of the Gaussian Process (GP) model derived from a Graph Convolutional Network (GCN) can be programmatically composed to match different GNN architectures.
Key Idea: Using a programmable procedure, the covariance matrix and its low-rank approximation can be automatically transformed in the same way as the operations in a GNN. This allows creating corresponding kernels for various GNN architectures like GCNII, GIN, and GraphSAGE.
Transformation Process:
Practical Examples:
Main Takeaway: By treating the transformations on the covariance matrix in a programmable way, the paper demonstrates how any GNN architecture can be associated with a corresponding GP kernel, making it possible to extend the GP model to new GNN designs.
The experiments aim to demonstrate the performance of GP kernels derived by taking limits on the layer width of GCNs and other GNN architectures. The primary goal is to show that these GPs are comparable in prediction accuracy while being significantly faster to compute.
The experiments were conducted on multiple benchmark datasets, covering both classification and regression tasks:
Prediction Performance (GCN-Based Comparison):
-X
), which uses a Nyström approximation for scalability.Performance Metrics:
Experiment 1: Comparing GCNs and GPs:
Experiment 2: Comparing with Other GNNs:
Running Time Comparison:
Scalability Analysis:
Analysis on the Depth:
Analysis on the Landmark Set:
The experiments show that GP kernels derived from GCNs can achieve comparable prediction performance to GNNs while being computationally more efficient. This demonstrates the practicality of the proposed GP approach for large-scale and deep graph-based tasks.
I have reviewed this paper but found it challenging to fully comprehend the mathematical proofs and rigorous details presented in the equations. If anyone could provide insights or comments, I would greatly appreciate it :)
I have reviewed this paper but found it challenging to fully comprehend the mathematical proofs and rigorous details presented in the equations. If anyone could provide insights or comments, I would greatly appreciate it :)
to fully comprehend the mathematical proofs --> as you clearly mentioned in https://github.com/AIML-K/GNN_Survey/issues/7#issuecomment-2430977827 , understanding NN and GP equivalence will help you understanding this. In particular, paper on infinitely-wide NN equivalent to GP will be useful.
Don't go to the first source (Radford M Neal. Priors for infinite networks. In Bayesian Learning for Neural Networks, pages 29–53. Springer, 1996. -- too statistical). I recommend https://arxiv.org/abs/1711.00165 or https://openreview.net/pdf?id=rkl4aESeUH .
Meanwhile, your decision to attend GP workshop will pay off.
Graph neural network-inspired kernels for gaussian processes in semi-supervised learning