Closed jermainewang closed 4 years ago
[Enhancement] Distributed training usability enhancement. Extract generic distributed training tools from the examples and move them into the core. https://github.com/dmlc/dgl/issues/864
[Feature] Distributed graph store can be used not just by distributed KGE (knowledge graph embedding) training, but also by arbitrary graph embedding training. https://github.com/dmlc/dgl/issues/869
[Feature] Support TensorFlow. https://github.com/dmlc/dgl/issues/422 [Feature] Support Keras API. https://github.com/dmlc/dgl/issues/909
Serialize HeteroGraph should also be included
https://github.com/dmlc/dgl/issues/939 Adding this issue, to see whether we could do this with nodeflow.
Also float16 support (Just for record, no intention to add it to 0.5)
Serialize HeteroGraph should also be included
do you mean defining a format to store HeteroGraph in a file?
@zheng-da Yes
Since there are several dependency and building issues, I suggest we release a minor patch version v0.4.1 to fix them. It should at least include:
Accelerate graph query API on large graphs.
Recommender system is a very broad topic, I was wondering where we should get it started? https://github.com/DeepGraphLearning/RecommenderSystems/blob/master/readingList.md
You are right. The initial release will include the following models:
DGL.subgraph do not support pickle (NotImplementedError: SubgraphIndex pickling is not supported yet.)
Also pickling for DGLGraph is slow. This might be a bottleneck when sending multiple objects across multiprocessing
Try removing atomic operations and guaranteeing deterministic behaviors, as mentioned in #908 .
R-GCN link prediction fails both on CPU (16G memory) and GPU (12G memory) for the FB15k-237 dataset. In both cases, it is trying to allocate additional memory during evaluation and failing. This needs to be resolved as R-GCN is used as a baseline for many experiments. Also, filtered metrics are not implemented. But I have a working code for filtered metrics. I can contribute to it.
Thanks @kingsaint . We will try fix it. More detail of this issue could be fount at https://github.com/dmlc/dgl/issues/997#issuecomment-554210506
Is there a schedule for the new release?
We don't have an exact date yet. Nevertheless, many items are already done or under construction. Therefore, you could expect several minor releases before the full v0.5 because we'd like to deliver these features ASAP. We will update this roadmap to be a tracker so it is clearer what is happening.
Hi all, the team had some offline discussions and we decided to split the tasks into two releases so the progress is more tractable. We plan to release a minor version v0.4.2 on 01/23/2020 includes important updates on documentation, tensorflow support, kernel optimization and application packages such as DGL-KE and DGL-chem. All the other features (and new requests) are pushed to v0.5, of which the exact date has not been fixed yet. @mufeili @zheng-da please update the TODO tasks in DGL-KE and DGL-Chem. Thanks.
@jermainewang Done.
Hi everyone, we decided to publish a minor release v0.4.3 on 03/31/20 due to the increasing demand of nightly-build and for the first release of DGL-KE and DGL-LifeSci as standalone packages. Please refer to this roadmap for more details.
Edit: Also updated the 0.5 roadmap by removing some stale and finished items in v0.4.2 and v0.4.3.
Very excited to see some temporal graphs on the roadmap! I have experience with temporal modeling with RNNs and LSTMs but not with GNNs. Looking forward to learning more!
Especially looking forward to something for temporal knowledge graphs, like RE-Net (https://arxiv.org/pdf/1904.05530.pdf), on dgl
, that would be awesome. I know pytorch_geometric
has an implementation: https://github.com/rusty1s/pytorch_geometric/blob/master/examples/renet.py.
Very excited to see some temporal graphs on the roadmap! I have experience with temporal modeling with RNNs and LSTMs but not with GNNs. Looking forward to learning more! Especially looking forward to something for temporal knowledge graphs, like RE-Net (https://arxiv.org/pdf/1904.05530.pdf), on
dgl
, that would be awesome. I knowpytorch_geometric
has an implementation: https://github.com/rusty1s/pytorch_geometric/blob/master/examples/renet.py.Here is an DGL implementatino of renet: https://github.com/changlinzhang/RENet
And also the official implementation of RE-Net uses DGL: https://github.com/INK-USC/RE-Net
Here is an DGL implementatino of renet: https://github.com/changlinzhang/RENet
And also the official implementation of RE-Net uses DGL: https://github.com/INK-USC/RE-Net
Ah, wow, I should have noticed that fact about the original repository. Thank you so much for the kind reminder!
Just for a reminder, we will need to make DGL-LifeSci a stand-alone repo in v0.5.
Is DGL-RecSys coming soon? Is it possible to be released in July?
When is it possible for DGL to support cuda 11? I just met so many problems about it. Thanks.
@lv2020 Which framework are you using? Most frameworks are not released with CUDA 11 for now I think.
@lv2020 Which framework are you using? Most frameworks are not released with CUDA 11 for now I think.
I'm using pytorch. But the cuda version on my lab's server is 11, while I can't sudo. I also tried to build DGL from source but there were still somethings needed to sudo...
@lv2020 I didn't see there's pytorch release with cuda 11. If you met permission problem, I would suggest using conda instead of pure python. conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
with conda install dgl-cuda10.2 -c dglteam
should work
@lv2020 I didn't see there's pytorch release with cuda 11. If you met permission problem, I would suggest using conda instead of pure python.
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
withconda install dgl-cuda10.2 -c dglteam
should work
In fact, the error is "OSError: libcublas.so.10: cannot open shared object file: No such file or directory". So I tried "conda install -c anaconda cudatoolkit=10.1" at first. But it doesn't work. And when I use nvidia-smi to check, I found the cuda version is 11.0. Then I tried to build the DGL from source. The main problem is that I need to install many packages which need sudo...
I also tried your command, it doesn't work either.
Does conda work with pytorch on your machine?
Thank you! I just solve the problem by reinstalling pytorch and DGL with conda
For what it's worth, @lv2020, it may make all of this easier for you to setup these libraries in their own conda environments: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands. Glad to hear you got things setup!
When will DGL support distributed training for very large-scale graphs(> billions nodes) which are stored on a distributed file system(e.g. HDFS)? I am also interested about whether there is a plan to integrate DGL with popular big data frameworks(e.g. Spark)?
DGL v0.5 has been released. See the release note here: https://github.com/dmlc/dgl/releases/tag/0.5.0 . We will keep patching any bugs and problems in the next few weeks. Will open a new roadmap thread for v0.6. Thanks everyone for the discussion here!
v0.5 Release
[Enhancement] Distributed GNN training
[Enhancement] Core API refactor
dgl.reverse
g.edge_ids
(discussion)[Doc] Document improvement
[Model] Recsys models
Similar to the DGL-KE and DGL-Chem packages, we'd like to organize popular models and datasets together for recommender system. We'd also like to take this chance to add CI and other relevant supports to DGL-KE and DGL-Chem.
[Feature] I/O
[Enhancement] NN modules
Maybe later (after v0.5)
v0.4.3 Release (Date: 03/31/2020)
Tracker: #1388
v0.4.2 Release (Date: 01/23/2020)
Feel free to reply and comment.