Closed NorbertZheng closed 1 year ago
Classic deep learning architectures such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) require the input data to be regular, such as 2D or 3D Euclidean grids for Computer Vision and 1D lines for Natural Language Processing.
However, real-world data beyond images and language tends to an underlying structure that is non-Euclidean. Such complex data commonly occurs in science and engineering, and can be modeled intuitively by heterogeneous graphs. Prominent examples include graphs of molecules, 3D meshes in computer graphics, social networks and biological networks.
Obtaining insights from large and complex graph-structured datasets leads to an interesting challenge for machine learning architectures: The popular CNN and RNN models need to be redesigned for handling non-Euclidean data, as they cannot leverage familiar regularities such as
E.g. without a part of prior knowledge to some extent.
Graph/Geometric Deep Learning is an umbrella term for emerging techniques attempting to generalize deep neural networks to non-Euclidean domains such as graphs and manifolds [Bronstein et al., 2017].
These Graph Neural Network (GNN) architectures are used as backbones for challenging domain-specific applications in a myriad of domains, including chemistry, social networks, recommendations and computer graphics.
Each GNN layer computes $d$-dimensional representations for the nodes/edges of the graph through recursive neighborhood diffusion (a.k.a. message passing), where each graph node gathers features from its neighbors to represent the local graph structure. Stacking $L$ GNN layers allows the network to build node representations from the $L$-hop neighborhood of each node.
Let $h{i}^{l}$ denote the feature vector at layer $l$ associated with node $i$. The updated features $h{i}^{l+1}$ at the next layer $l+1$ are obtained by applying non-linear transformations to the central feature vector $h{i}^{l}$ and the feature vectors $h{j}^{l}$ for all nodes $j$ in the neighborhood of node $i$ (defined by the graph structure). This guarantees the transformation to build local reception fields, such as in standard ConvNets for computer vision, and be invariant to both graph size and vertex re-indexing.
Thus, the most generic version of a feature vector $h_{i}^{l+1}$ at vertex $i$ at the next layer in the GNN is:
$$ h{i}^{l+1}=f(h{i}^{l},h_{j}^{l}:j\to i), $$
where $j \to i$ denotes the set of neighboring nodes $j$ pointed to node $i$, which can be replaced by $j \in \mathcal{N}_{i}$, the set of neighbors of node $i$, if the graph is undirected.
In other words, a GNN is defined by a mapping $f$ taking as input a vector $h{i}^{l}$ - the feature vector of the center vertex - as well as an unordered set of vectors $\{h{j}^{l}\}$ - the feature vectors of all neighboring vertices. The arbitrary choice of the mapping $f$ defines an instantiation of a class of GNNs, e.g. GCN, GraphSage, GIN.
As an illustration, here's a simple-yet-effective Graph ConvNet from Sukhbaatar et al., 2016:
$$ h{i}^{l+1}=RELU\left(U^{l}h{i}^{l}+\sum{j \in \mathcal{N}{i}}V^{l}h_{j}^{l}\right), $$
where $U^{l},V^{l} \in \mathbb{R}^{d \times d}$ are the learnable parameters.
In a recent paper on benchmarking GNN architectures, we introduced block diagrams to intuitively describe feature update equations such as the one above:
Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. As the field grows, it becomes critical to
Unfortunately, it has been increasingly difficult to gauge the effectiveness of new models in the absence of a standardized benchmark with consistent experimental settings. In this paper, we introduce a reproducible GNN benchmarking framework, with the facility for researchers to add new models conveniently for arbitrary datasets. We demonstrate the usefulness of our framework by presenting a principled investigation into the recent Weisfeiler-Lehman GNNs (WL-GNNs) compared to message passing-based graph convolutional networks (GCNs) for a variety of graph tasks, i.e. graph regression/classification and node/link prediction, with medium-scale datasets.
As graphs have no specific orientations (like up, down, left, right directions in images), message-passing layers such as Sukhbaatar’s Graph ConvNet are isotropic, treating all neighbors as equally important. However, this may not be true in general, e.g. in social network graphs, neighbors in the same community share different relationships and information compared to neighbors from separate communities.
Isotropic GNNs can be upgraded to make the diffusion process anisotropic through mechanisms which learn to weight neighbors based on their relative importance. For example, Marchegiani and Titov, 2017 upgrade Graph ConvNets by introducing edge gating for learning information flow on the graph structure for the task at hand:
$$ h{i}^{l+1}=RELU\left(U^{l}h{i}^{l}+\sum{j \in \mathcal{N}{i}}\eta{ij} \odot V^{l}h{j}^{l}\right), $$
where $\eta{ij}=\sigma(A^{l}h{i}^{l}+B^{l}h{j}^{l})$, $U^{l}, V^{l}, A^{l}, B^{l} \in \mathbb{R}^{d \times d}$ are the learnable parameters, $\sigma$ is the sigmoid function, $\odot$ is the element-wise product, and $\eta{ij}$ act as edge gates.
Other prominent approaches to introduce anisotropy into GNNs include GAT, which uses the attention mechanism from NLP, as well as MoNet, which relies on gaussian mixture models of graph connectivity. Through our benchmark, we found anisotropic aggregation to be a key property of powerful GNNs.
Chaitanya Joshi, Vijay Prakash Dwivedi, Xavier Bresson. Spatial Graph ConvNets.