[Roadmap] 0.6 Release Plan

jermainewang commented 3 years ago

Thank every contributor for the successful landing of the recent 0.5 release. Since 0.5 is such a huge update, the 0.6 cycle will focus mostly on completing and optimizing the existing functionalities. Particularly, we see the following four areas of improvment:

Continue polishing the documentation with a focus on step-by-step tutorials.
Extensive regression tests for both end-to-end and kernel-level benchmarks.
Release new model examples.
Improve distributed training infrastructure.

Here are the initial task lists.

[Doc] A DGL quickstart tutorial (DGL 60 minute bliz). Welcome any suggestions on what contents this tutorial should cover.
[Doc] Tutorials for node/link/graph classification, mini-batch training, etc. Update out-dated model tutorials.
[Doc] User guide preface chapter (a guide to user guide)
[Doc] Chinese user guide
[Doc] Fix the dgl.function API docpage.
[Doc] User guide on multi-GPU training.
[Test] Build the infrastructure for regression tests.
[Test] Release and integrate a set of end-to-end benchmarks for single machine single CPU/GPU and distributed training.
[Test] Release and integrate a set of kernel-level benchmarks for gspmm/gsddmm, topological queries, sampling, etc.
[Dist] New model: distributed R-GCN on heterograph.
[Dist] New model: distributed Heterogeneous Graph Transformer (HGT).
[Dist] Support of distributed training on heterogeneous graph.
[Dist] Support distributed METIS partitioning on graphs stored in distributed file storage such as HDFS/S3.
[Dist] Fix the issue in the distributed Adagrad optimizer where it currently cannot train R-GCN to a good accuracy.
[Model] New model: LADIES sampling
[Model] New model: We'd like to hear more voices from the community here.
[Kernel] Experimental integration of the FeatGraph kernels.
[Sampling] API sample_neighbors returns edges sorted by destination nodes so it is very easy to convert it to CSC after to_block.
[Sampling] Implement biased sampling based on node types.

There will be a minor patch update in the following two or three weeks to include some necessary functionalities required by DGL-KE. ~~0.5.3 release~~ Released. See release note here.

As usual, welcome any comments and feedback!

osljw commented 3 years ago

Production environment demo. eg: how to export model to SavedModel(for tensorflow backend)

Smilenone commented 3 years ago

when will dgl release the binary installation version which support cuda11?

Aceticia commented 3 years ago

Thanks for the work. Will dgl include hierarchical pooling layers in nn?

jermainewang commented 3 years ago

Thanks for the work. Will dgl include hierarchical pooling layers in nn?

@Aceticia Would you like to provide some pointers?

jermainewang commented 3 years ago

when will dgl release the binary installation version which support cuda11?

@Smilenone We are working on it. We've tested CUDA 11.1 but there are still some issues with CUDA 11.0. You could refer to this issue for instructions to build DGL against CUDA 11.1.

Aceticia commented 3 years ago

Just to name a few, some of the work I'm familiar with are: DiffPool: https://arxiv.org/abs/1806.08804 ASAP: https://arxiv.org/pdf/1911.07979.pdf Self Attention Pooling: https://arxiv.org/abs/1904.08082

chaitjo commented 3 years ago

Hi guys, I think something DGL should support but has been missing for a while is local pooling operations, s.a. graclus, top-k pooling, min-cut pooling, differential pooling (sparse variant), etc. I wonder if it is on your horizon?

Competing libraries such as PyG and Spekral are already supporting this.

mufeili commented 3 years ago

Just to name a few, some of the work I'm familiar with are: DiffPool: https://arxiv.org/abs/1806.08804 ASAP: https://arxiv.org/pdf/1911.07979.pdf Self Attention Pooling: https://arxiv.org/abs/1904.08082

Thank you for the pointers. I think we already have an example for DiffPool. There's a PR for an example of Self-Attention Pooling and we are aware of ASAP. We will consider supporting them in NN modules.

hetong007 commented 3 years ago

@chaitjo We are working on adding example implementations for a series of graph pooling models. There is already an implementation for SAGPool(Code), and the current ongoing effort is about HGP-SL. These two examples are expected to be included in our 0.6 release.

Afterwards, we would also like to implement more in this domain. Could you please list the works you mentioned with links/names for us to locate and consider? Thanks!

chaitjo commented 3 years ago

I'd personally be interested in some baseline/classic graph clustering techniques, such as Graclus clustering. Besides the new methods you've mentioned, I am also aware of DiffPool (I wonder if sparse variant is even possible?) and Graph Memory Networks.

hetong007 commented 3 years ago

Thanks for the recommendation, we'll mark them in the candidate pool of to-do models.

chaitjo commented 3 years ago

Here's another experience I had while using DGL, which I'd like to share with the developer community:

I was working on a project involving Reinforcement Learning on mini-batches of graphs. These mini-batches were not being read from memory but rather, were being generated on the fly. I had been using dense implementations of GNNs for this project, but I ported my code over to DGL as I wanted to leverage sparsity while handling very large graphs (impossible with dense convolution format due to GPU memory explosion). I expected that DGL will significantly speedup my GNNs, and it absolutely did. However, the bottleneck in the code now shifted to the data generation process, where I was repeatedly constructing DGLGraph objects in order to feed the GNNs. This process added significant computational time compared to my previous data generation process, where I was simply using numpy/torch's random functions to define graphs. With DGL, I had added an additional conversion to DGLGraph objects into the mix, and this caused the code to be significantly slowed down. (Here's the paper, and you can read a bit more in Appendix B.)

Now, I understand this stuff is completely anecdotal and probably applies only to the specific use-case of mini-batch graph processing where graphs are generated on the fly. I just wanted to share it with the developers, nonetheless.

jermainewang commented 3 years ago

@chaitjo This is in fact on our radar. We previously have located some issues such as too many sanity checks on the python side. Other reasons are due to the merge of DGLHeteroGraph and DGLGraph in 0.5, which introduces additional string type name lookup during construction. We are working on them and will definitely improve it in 0.6. Really appreciate for the support!

chaitjo commented 3 years ago

@jermainewang Awesome, and thank you for the great work!

dmlc / dgl

[Roadmap] 0.6 Release Plan #2252