[RFC] Adjacency matrix multiplication and summation functions

🚀 Feature

This issue proposes two graph transformation functions, adj_product_graph and adj_sum_graph, corresponding to adjacency matrix multiplication and summation. Both of them are differentiable w.r.t. edge weights. The function names are tentative and more name options are welcome.

Motivation

These functions are necessary components for supporting Graph Transformer Networks (GTN).

Function proposal: `adj_product_graph`

Signature

def adj_product_graph(g_a: DGLGraph, g_b: DGLGraph, weight_column: str) -> DGLGraph:
    pass

Description (Docstring)

Construct a graph whose (weighted) adjacency matrix is the product of the two given (weighted) adjacency matrices.

In DGL, this function accepts two graphs G_A and G_B. Both G_A and G_B must have only one edge type (s_A, e_A, d_A) and (s_B, e_B, d_B), and must be simple graphs. G_A's destination node type d_A must be the same as G_B's source node type s_B. The number of nodes with type d_A in G_A must be the same as the number of nodes with type s_B in G_B.

If weight_column is specified, this function treats the edge features with name weight_column in G_A and G_B as the (scalar) edge weights.

This function returns a single graph G_C whose source node type is the same as G_A's source node type s_A, and whose destination node type is the same as G_B's destination node type d_B. As a consequence, G_C will be homogeneous if s_A is the same as d_B (in which case the number of nodes must match), or bipartite otherwise.

An edge exists between node i of type s_A and node j of type d_B in G_C iff there exists a k such that

An edge between node i of type s_A and node k of type d_A exists in G_A, and
An edge between node k of type s_B and node j of type d_B exists in G_B.

If weight_column is specified, DGL determines the weight of the edge between node i of type s_A and node j of type d_B in G_C as follows:

C_{ij} = \sum_{k: (i,k) \in G_A, (k,j) \in G_B} A_{ik} B_{kj}

Mathematically, this is equivalent to multiplying two weighted adjacency matrices.

If weight_column is specified, G_C's edge weights will be differentiable w.r.t. the two input graphs' edge weights. Putting in equations, assuming that the forward function is

C = AB

The backward function is

\nabla_A \mathcal{L} = (\nabla_C \mathcal{L}) B^\top \odot \mathbf{1}_A \\
\nabla_B \mathcal{L} = A^\top (\nabla_C \mathcal{L}) \odot \mathbf{1}_B

where \mathbf{1}_A and \mathbf{1}_B are 0-1 mask matrices indicating whether an edge exists in G_A and G_B respectively (or the non-zero entries of A and B).

Function proposal: `adj_sum_graph`

Signature

def adj_sum_graph(gs: List[DGLGraph], weight_column: str) -> DGLGraph:
    pass

Description (Docstring)

Construct a graph whose (weighted) adjacency matrix is the sum of the given (weighted) adjacency matrices.

In DGL, this function accepts a list of graphs G_i that

Must have only one edge type.
Must have the same metagraph.
Must have the same number of nodes for both source node type and destination node type.

If weight_column is specified, this function treats the edge features with name weight_column in all the graphs as the (scalar) edge weights.

This function returns a single graph G with the same metagraph as the input graphs.

An edge exists between two nodes in G iff an edge exists between the same nodes in either one of the input graphs.

If weight_column is specified, DGL determines the weight of the edge between node i and node j in G as follows:

A_{ij} = \sum_{k} A_{k,ij}

where A_{k,ij} is the edge weight between node i and j if the edge exists in G_k, or 0 otherwise.

Mathematically, this is equivalent to adding a list of weighted adjacency matrices.

If weight_column is specified, G's edge weights will be differentiable w.r.t. the two input graphs' edge weights. Mathematically, the backward function is:

\nabla_{A_k} \mathcal{L} = \nabla_A \mathcal{L} \odot \mathbf{1}_{A_k} \\

There are also other operations that can benefit from these functions. For example, the khop_graph where two input graphs are the same, and the metapath_reachable_graph where two input graphs are two subgraphs (with different edge types n_1 -> e_1 -> n_2 and n_2 -> e_2 -> n_3) in a heterogeneous graph. Currently, these operations are implemented via sparse matrix operations in scipy, which limits the performance.

Generally speaking, I think there are three scenarios involving these functions:

k-hop graphs for homogeneous graphs whose adjacency matrix is A^k. Although directly using A^k in the computation of graph neural networks may be inefficient since k-hop graphs (especially with self-loop) are usually very dense (actually, a common practice is to compute A^k X with stacked aggregation operations instead of A^k, an example is the Simplifying Graph Convolutional Networks), this operation may be useful in some other scenarios. Also, dgl has already had such an operation and these functions can make it more efficient.
A new graph whose adjacency matrix is A + A^2 + \dots + A^k or Bool(A + A^2 + \dots + A^k). In such a scenario, we want to exploit the locality of the original graph. For example, we want to sample nodes from all nodes whose distance from the central node is less than or equal to k. We can find such a process in the paper Hierarchical Graph Pooling with Structure Learning.
The scenario described by metapath_reachable_graph. We want to find a path (with length equals to 2) from node A with type a to node C with type c where the "middle node" is node B with type b. Here I describe an operation that can benefit from these functions: the cluster_pooling operation. The cluster_pooling operation is described in #2599. We can implement this operation using a heterogeneous graph with two node types (node and cluster) and three edge types (node to node, node to cluster, cluster to node). To pooling node features, we can simply apply message passing and reduce operations from nodes to clusters. To pooling edges, we can compute a metapath_reachable_graph with three edge types: cluster to node, node to node, and node to cluster. The edges within a cluster will be treated as a self-loop edge and we can apply remove_self_loops to remove them. The remaining edges are inter-cluster edges that we want to preserve.

dmlc / dgl