dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.43k stars 3k forks source link

[Roadmap] v0.5 release plan #930

Closed jermainewang closed 4 years ago

jermainewang commented 4 years ago

v0.5 Release

[Enhancement] Distributed GNN training

[Enhancement] Core API refactor

[Doc] Document improvement

[Model] Recsys models

Similar to the DGL-KE and DGL-Chem packages, we'd like to organize popular models and datasets together for recommender system. We'd also like to take this chance to add CI and other relevant supports to DGL-KE and DGL-Chem.

[Feature] I/O

[Enhancement] NN modules

Maybe later (after v0.5)

v0.4.3 Release (Date: 03/31/2020)

Tracker: #1388

v0.4.2 Release (Date: 01/23/2020)

Feel free to reply and comment.

futurely commented 4 years ago

[Enhancement] Distributed training usability enhancement. Extract generic distributed training tools from the examples and move them into the core. https://github.com/dmlc/dgl/issues/864

[Feature] Distributed graph store can be used not just by distributed KGE (knowledge graph embedding) training, but also by arbitrary graph embedding training. https://github.com/dmlc/dgl/issues/869

[Feature] Support TensorFlow. https://github.com/dmlc/dgl/issues/422 [Feature] Support Keras API. https://github.com/dmlc/dgl/issues/909

VoVAllen commented 4 years ago

Serialize HeteroGraph should also be included

VoVAllen commented 4 years ago

https://github.com/dmlc/dgl/issues/939 Adding this issue, to see whether we could do this with nodeflow.

VoVAllen commented 4 years ago

Also float16 support (Just for record, no intention to add it to 0.5)

zheng-da commented 4 years ago

Serialize HeteroGraph should also be included

do you mean defining a format to store HeteroGraph in a file?

VoVAllen commented 4 years ago

@zheng-da Yes

jermainewang commented 4 years ago

Since there are several dependency and building issues, I suggest we release a minor patch version v0.4.1 to fix them. It should at least include:

zheng-da commented 4 years ago

Accelerate graph query API on large graphs.

YichengDWu commented 4 years ago

Recommender system is a very broad topic, I was wondering where we should get it started? https://github.com/DeepGraphLearning/RecommenderSystems/blob/master/readingList.md

zheng-da commented 4 years ago

You are right. The initial release will include the following models:

VoVAllen commented 4 years ago

DGL.subgraph do not support pickle (NotImplementedError: SubgraphIndex pickling is not supported yet.)

VoVAllen commented 4 years ago

Also pickling for DGLGraph is slow. This might be a bottleneck when sending multiple objects across multiprocessing

mufeili commented 4 years ago

Try removing atomic operations and guaranteeing deterministic behaviors, as mentioned in #908 .

kingsaint commented 4 years ago

R-GCN link prediction fails both on CPU (16G memory) and GPU (12G memory) for the FB15k-237 dataset. In both cases, it is trying to allocate additional memory during evaluation and failing. This needs to be resolved as R-GCN is used as a baseline for many experiments. Also, filtered metrics are not implemented. But I have a working code for filtered metrics. I can contribute to it.

VoVAllen commented 4 years ago

Thanks @kingsaint . We will try fix it. More detail of this issue could be fount at https://github.com/dmlc/dgl/issues/997#issuecomment-554210506

wenruij commented 4 years ago

Is there a schedule for the new release?

jermainewang commented 4 years ago

We don't have an exact date yet. Nevertheless, many items are already done or under construction. Therefore, you could expect several minor releases before the full v0.5 because we'd like to deliver these features ASAP. We will update this roadmap to be a tracker so it is clearer what is happening.

jermainewang commented 4 years ago

Hi all, the team had some offline discussions and we decided to split the tasks into two releases so the progress is more tractable. We plan to release a minor version v0.4.2 on 01/23/2020 includes important updates on documentation, tensorflow support, kernel optimization and application packages such as DGL-KE and DGL-chem. All the other features (and new requests) are pushed to v0.5, of which the exact date has not been fixed yet. @mufeili @zheng-da please update the TODO tasks in DGL-KE and DGL-Chem. Thanks.

mufeili commented 4 years ago

@jermainewang Done.

jermainewang commented 4 years ago

Hi everyone, we decided to publish a minor release v0.4.3 on 03/31/20 due to the increasing demand of nightly-build and for the first release of DGL-KE and DGL-LifeSci as standalone packages. Please refer to this roadmap for more details.

Edit: Also updated the 0.5 roadmap by removing some stale and finished items in v0.4.2 and v0.4.3.

AlexMRuch commented 4 years ago

Very excited to see some temporal graphs on the roadmap! I have experience with temporal modeling with RNNs and LSTMs but not with GNNs. Looking forward to learning more!

Especially looking forward to something for temporal knowledge graphs, like RE-Net (https://arxiv.org/pdf/1904.05530.pdf), on dgl, that would be awesome. I know pytorch_geometric has an implementation: https://github.com/rusty1s/pytorch_geometric/blob/master/examples/renet.py.

BarclayII commented 4 years ago

Very excited to see some temporal graphs on the roadmap! I have experience with temporal modeling with RNNs and LSTMs but not with GNNs. Looking forward to learning more! Especially looking forward to something for temporal knowledge graphs, like RE-Net (https://arxiv.org/pdf/1904.05530.pdf), on dgl, that would be awesome. I know pytorch_geometric has an implementation: https://github.com/rusty1s/pytorch_geometric/blob/master/examples/renet.py.

Here is an DGL implementatino of renet: https://github.com/changlinzhang/RENet

And also the official implementation of RE-Net uses DGL: https://github.com/INK-USC/RE-Net

AlexMRuch commented 4 years ago

Here is an DGL implementatino of renet: https://github.com/changlinzhang/RENet

And also the official implementation of RE-Net uses DGL: https://github.com/INK-USC/RE-Net

Ah, wow, I should have noticed that fact about the original repository. Thank you so much for the kind reminder!

mufeili commented 4 years ago

Just for a reminder, we will need to make DGL-LifeSci a stand-alone repo in v0.5.

wenruij commented 4 years ago

Is DGL-RecSys coming soon? Is it possible to be released in July?

lv2020 commented 4 years ago

When is it possible for DGL to support cuda 11? I just met so many problems about it. Thanks.

VoVAllen commented 4 years ago

@lv2020 Which framework are you using? Most frameworks are not released with CUDA 11 for now I think.

lv2020 commented 4 years ago

@lv2020 Which framework are you using? Most frameworks are not released with CUDA 11 for now I think.

I'm using pytorch. But the cuda version on my lab's server is 11, while I can't sudo. I also tried to build DGL from source but there were still somethings needed to sudo...

VoVAllen commented 4 years ago

@lv2020 I didn't see there's pytorch release with cuda 11. If you met permission problem, I would suggest using conda instead of pure python. conda install pytorch torchvision cudatoolkit=10.2 -c pytorch with conda install dgl-cuda10.2 -c dglteam should work

lv2020 commented 4 years ago

@lv2020 I didn't see there's pytorch release with cuda 11. If you met permission problem, I would suggest using conda instead of pure python. conda install pytorch torchvision cudatoolkit=10.2 -c pytorch with conda install dgl-cuda10.2 -c dglteam should work

In fact, the error is "OSError: libcublas.so.10: cannot open shared object file: No such file or directory". So I tried "conda install -c anaconda cudatoolkit=10.1" at first. But it doesn't work. And when I use nvidia-smi to check, I found the cuda version is 11.0. Then I tried to build the DGL from source. The main problem is that I need to install many packages which need sudo...

I also tried your command, it doesn't work either.

VoVAllen commented 4 years ago

Does conda work with pytorch on your machine?

lv2020 commented 4 years ago

Thank you! I just solve the problem by reinstalling pytorch and DGL with conda

AlexMRuch commented 4 years ago

For what it's worth, @lv2020, it may make all of this easier for you to setup these libraries in their own conda environments: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands. Glad to hear you got things setup!

hibayesian commented 4 years ago

When will DGL support distributed training for very large-scale graphs(> billions nodes) which are stored on a distributed file system(e.g. HDFS)? I am also interested about whether there is a plan to integrate DGL with popular big data frameworks(e.g. Spark)?

jermainewang commented 4 years ago

DGL v0.5 has been released. See the release note here: https://github.com/dmlc/dgl/releases/tag/0.5.0 . We will keep patching any bugs and problems in the next few weeks. Will open a new roadmap thread for v0.6. Thanks everyone for the discussion here!