[RFC] DGL + cuGraph - Githubissues

🚀 Feature

RAPIDS cuGraph is a high-quality library for GPU accelerated graph algorithms (see the full list of supported algorithms). As many of these algorithms are backbone of graph ML, we see more and more GNNs utilizing them as building blocks. Currently, DGL only provides CPU implementations for a limited number of them, which forces users to either copy graphs between CPU and GPU frequently, or write these algorithms from scratch. This proposal wishes to initiate the discussion on how to combine the merits of both packages.

Some of the ideas come from the short discussion between @jermainewang and @jeaton32 in Mar. 2021.

Motivation

The RFC is motivated by real use cases and community requests. Here, I listed some of the notable ones.

Accelerate GNNs with graph traversal

The examples are

TreeLSTM which traverses graphs in topological order
JTNN and SpEagle that reply on belief propagation order (BFS).
APPNP and C&S which use label propagation.

The community also has been asking for more efficient implementation (see issue). Currently, DGL provides a very limited set of graph traversal routines and they only have CPU implementation. Besides, all these traversals require batch support (i.e., performing multiple traversals simultaneously) which, as confirmed by Joe, aligns with cuGraph's roadmap.

Accelerate GNNs with mini-batch generation

Mini-batch training is an important topic in GNN research. Current training pipeline contains two steps: (1) perform sampling on the input graph to generate mini-batches in the form of (smaller) subgraphs and (2) compute message passing on the samples. There have been many evidences about sampling being the major bottleneck in this pipeline. Recently in DGL, we have made some initial efforts to move some costly sampling step to GPU (see PR #2716). It is interesting to see whether we could utilize the rich subgraph extraction APIs from cuGraph to further speed it up. One example is the IGMC model which needs a different type of mini-batches which is essentially an EgoNet extraction.

Accelerate GNNs with random walk

Random walk is commonly used in network embedding models like Deepwalk, node2vec and metapath2vec. As random walk is already on cuGraph's roadmap, it is interesting to see how it can further improve these models.

Accelerate low-level operators

These operators are not user-facing but are widely used in the system as backbone. One example is renumbering which is commonly used after subgraph extraction to compact node/edge ID space.

Pitch

At a base level, it should be already quite helpful to make dgl.DGLGraph and cuGraph.Graph interchangeable with each other by providing APIs like:

def to_cugraph(g : dgl.DGLGraph) -> cuGraph.Graph: 
  # convert a DGLGraph to cuGraph.Graph

def to_dglgraph(g : cuGraph.Graph) -> dgl.DGLGraph:
  # convert a cuGraph.Graph to DGLGraph

Having cuGraph APIs support DGLGraph type input can further improve user experience.
As mentioned above, the use cases from DGL call for new features to cuGraph such as more traversal routines, batched traversal, batched random walk, etc.
A deeper integration can potentially happen at C++ level instead of Python. Example is using cuGraph's renumbering utility in DGL to speed up some operators.

Additional context

In terms of technical feasibility, the two projects have already aligned in many aspects accidentally. First of all, both frameworks support DLPack practice, making it easier to exchange array-like memory without copies. Second, cuGraph aims at providing APIs similar to networkx's, which is also one of the initial design considerations of DGL.

dmlc / dgl

[RFC] DGL + cuGraph #2861

🚀 Feature

Motivation

Pitch

Additional context