[Roadmap] v0.4 release tracker

jermainewang commented 5 years ago

Tentative release date: 09/30

[Feature] Heterogenous graph

This has been a high-demanding feature since the birth of DGL. It is finally the time to push for this. v0.4 will be majorly about this support, this includes but not limited to:

A new DGLHeteroGraph class and relevant APIs.
Adapt our message passing kernels to heterograph scenarios.
Sampling on heterograph.
Model demonstration
- RGCN
- GCMC
- One metapath-based model (e.g. HAN)

Tracker

[x] API implementation (PR #657 #824 )
[x] Model examples
- [x] RGCN (@jermainewang )
- [x] GCMC (@jermainewang )
- [x] Metapath-based example (@BarclayII @mufeili )
[x] Tutorial

[Feature] Global pooling module (Done in v0.3.1)

Our current graph pooling (readout) support is limited, with only basic sum/max readout operation. In v0.4, we want to enrich this part.

[Feature] Enrich NN modules (Mostly done in v0.3.1)

Tracker

[ ] Missing MXNet modules (@yzh119 )

@yzh119 please update.

[Feature] Unified graph data format

The idea is to define our own data storage format and provide easy utilities to convert, load, save to/from such format. (RFC #758 )

Tracker:

[x] Define the data format (PR #728 )
[x] Efficient loader and saver (PR #728 )
[x] Host popular dataset in this format (@VoVAllen )

[Application] Knowledge base embedding

Tracker:

[x] Implementation of popular network embedding methods
- [x] TransX, TransE, CompleX
- [x] DistMult
- [ ] RotateE
[x] Dataset
- [x] FB16K
- [ ] Full freebase
[x] Release training script of
- [x] Single machine multi-process
- [x] Single machine multi-GPU
[ ] Release of pre-trained embeddings

Other

[ ] Update tutorial (at glance and message passing tutorial). Shift the focus to built-in functions and NN modules.

Postpone to v0.5

[Feature] Distributed KVStore for embeddings: We wish to implement our own distributed KVStore that can store embeddings on multiple machines. If it's too rush, we could postpone this to the next cycle.

mufeili commented 5 years ago

For the pooling module, shall we also support common clustering algorithms (KNN, spectral clustering, ...)?

jermainewang commented 5 years ago

For the pooling module, shall we also support common clustering algorithms (KNN, spectral clustering, ...)?

I think we will focus on DL-based pooling methods in this release. For KNN and spectral, I would suggest converting our graph to numpy/scipy and use sklearn. If the conversion could be handled carefully (probably with zero-copy support), it should be very efficient.

yzh119 commented 5 years ago

Should GraphSage also be included in NN modules? And Set Transformer, is also a kind of graph pooling mechanism, if we have time we could try this.

aksnzhy commented 5 years ago

The CPU-based kvstore can be released in 0.4. The GPU-direct kvstore could be in the next cycle.

tbright17 commented 5 years ago

The self-attention graph pooling is simpler but more powerful than diffpool: https://arxiv.org/abs/1904.08082. Could be good if it's included.

HQ01 commented 5 years ago

The self-attention graph pooling is simpler but more powerful than diffpool: https://arxiv.org/abs/1904.08082. Could be good if it's included.

Just want to mention that there is an inconsistency between DiffPool's reported experiment results and Self-attention graph pooling paper's reported DiffPool results though.

tbright17 commented 5 years ago

The self-attention graph pooling is simpler but more powerful than diffpool: https://arxiv.org/abs/1904.08082. Could be good if it's included.

Just want to mention that there is an inconsistency between DiffPool's reported experiment results and Self-attention graph pooling paper's reported DiffPool results though.

Wow the gap is really big...

mufeili commented 5 years ago

Depending on our bandwidth, we may want to add examples for three important applications:

Molecule Property Prediction: Molecular graphs are probably among the most important applications for small graphs. For this area, Neural Message Passing for Quantum Chemistry can be a good example candidate. During our discussion with Tencent Alchemy team, this model has achieved the best performance among previous work on the quantum chemistry tasks they are interested. It has also been previously mentioned in the discussion forum here. I will take this.
Point Cloud: An important topic for constructing graphs over non-graph data and bridging graph computing with CV and graphics, as mentioned in Issue # 719.
Geometry/3D data: The latest wave of deep learning on graphs has a strong correlation with geometric data and can be collectively considered as geometric deep learning. There can be a high interest of applying graph neural networks to more general geometric data, as mentioned in a discussion thread before.

jermainewang commented 5 years ago

Changed the draft to a progress tracker. The target release date is 09/30.

For all committers @zheng-da @szha @BarclayII @VoVAllen @ylfdq1118 @yzh119 @GaiYu0 @mufeili @aksnzhy @zzhang-cn @ZiyueHuang , please vote with +1 if you agree with this plan.

aksnzhy commented 5 years ago

@jermainewang Actually the kvstore has been finished and we have already finish a demo to training distributed DistMult on FB15k data. If we should release this demo on 0.4?

jermainewang commented 5 years ago

@jermainewang Actually the kvstore has been finished and we have already finish a demo to training distributed DistMult on FB15k data. If we should release this demo on 0.4?

Yes. Let's push for the feature, but it's OK if we think it needs more time to polish and we could highlight it in v0.5.

jermainewang commented 5 years ago

v0.4 has been released. Thanks everyone for the support.

dmlc / dgl