dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.43k stars 3k forks source link

[Discussion] Graph backend and Boost graph library #467

Closed rayegun closed 5 years ago

rayegun commented 5 years ago

Feature

Adding support for a graph backend like Boost in place of or in addition to the current custom.

Motivations

The current graph backend doesn't support a number of important operations, importantly removing nodes and edges.

Integration with third party libraries may also be easier, in particular mutating the graph from external programs might be easier with a standard library.

Easier to support more graph types in the future and any of the already implemented algoritms in Boost which could be useful.

Potentially better performance, but the lightweight implementation already here may be faster.

Could support calling Boost based graph algorithms from DGL to integrate our own C++ code.

Alternatives

Keeping and iterating on the current backend.

Choosing another library, maybe Graph-Tools.

This is definitely a major undertaking, and I could lend my limited skills, but it would be quite beneficial for a number of my use cases.

jermainewang commented 5 years ago

Hi @Pangurbon , thank you very much for your interest and proposal. Really appreciate it.

I think the major question is what kinds of graph operations/algorithms are urgently or likely needed. The final technical solution will likely depend on the requirements. Also, user can always fallback to networkx if some operations are missing. However, such fallback is not cheap, thus we want to have efficient implementation for important/common operations.

From my side, I see following factors to be considered:

I'd like to hear more about your use cases. I think we could use this discussion to plot a clearer roadmap for this support.

rayegun commented 5 years ago

I think the major question is what kinds of graph operations/algorithms are urgently or likely needed

I'll elaborate more on the use case below. But removing edges/nodes and performing graph generation without reloading from a C++ interface are the major operations.

What kinds of models/application require such operations?

The sort of model we're looking at is somewhat novel. The goal is to rapidly assess a graph using DGL, perform some removal/generation on the graph based on this assessment and then reassess. This can be done with networkx already, but this application will be benched against very heavily optimized "graph generation algorithms" (the benchmarks don't actually generate a graph, they work with faster but less info-rich representations). So loading and unloading will be too expensive. Doing that process with some sort of policy optimization is the long term goal.

I saw a paper that was somewhat similar that I will try to find, I recall it involved generating a graph with an RL agent, choosing at each branch whether to add a node, edge, and select edge anchors based on some GNN policy.

We don't know whether people are interested in combining them with deep learning.

The couple projects we're doing involve heavy integration with outside analysis tools. These can be written with something like Boost (or cugraph even, I'll have to look into this. GPU is ideal but avoiding networkx will be a big speed up already).

In terms of data structure, is CSR and adjlist enough? In terms of upper-level semantics, do we need special treatment for trees, cliques, bipartite graphs?

I don't think any other data structures are necessary. We do use tree semantics quite a bit, but I don't think special semantics are necessary.

My personal goal would be some sort of interface with DGL graph (or a new backend), where I can generate and remove parts of the graph using some faster code. The flow would be:

load graph -> inference -> mutate/remove with C++ based on DGL output -> inference -> mutate ->...-> save all (graph, DGL output) pairs for training.

I know that's quite vague(some of the vagueness is just being inexperienced), but it'd be valuable for at least 3 different research projects we're exploring.

jermainewang commented 5 years ago

The sort of model we're looking at is somewhat novel. The goal is to rapidly assess a graph using DGL, perform some removal/generation on the graph based on this assessment and then reassess. This can be done with networkx already, but this application will be benched against very heavily optimized "graph generation algorithms" (the benchmarks don't actually generate a graph, they work with faster but less info-rich representations). So loading and unloading will be too expensive. Doing that process with some sort of policy optimization is the long term goal.

Thanks for the sharing. I'm more convinced now. Some follow-up questions:

The couple projects we're doing involve heavy integration with outside analysis tools. These can be written with something like Boost (or cugraph even, I'll have to look into this. GPU is ideal but avoiding networkx will be a big speed up already).

We are also looking into this. There are successful practices to implement more complicated graph algorithms in GPU (e.g. gunrock). We are trying to port some of their solutions to DGL so we can have a foundation to extend from.

My personal goal would be some sort of interface with DGL graph (or a new backend), where I can generate and remove parts of the graph using some faster code.

I'm thinking about having interfaces that allow advanced users (like you) to extend graph functionality in C++. Will that help?

For the short-term goal, we could try to quickly support node/edge removal.

rayegun commented 5 years ago

Do you wish to have autograd support for node/edge removal?

Could you expand on this a little more? I'm not sure what that would entail. I don't think it's necessary for the current plan, but I'll have to think about it.

Do you wish to adjust features when node/edge is removed? This might be costly due to large chunk of memory move.

Changing features is definitely desirable from an interface, but not specifically during removal.

We are also looking into this. There are successful practices to implement more complicated graph algorithms in GPU (e.g. gunrock). We are trying to port some of their solutions to DGL so we can have a foundation to extend from.

Gunrock looks very interesting. That sort of gets back to my original question, does it make sense to separate the graph backend further so that different backends can be interacted with through the same DGL interface? Similar to how Boost exposes the same interface for multiple graph representations.

I'm thinking about having interfaces that allow advanced users (like you) to extend graph functionality in C++. Will that help?

Yes, absolutely.

For the short-term goal, we could try to quickly support node/edge removal.

That would be immensely helpful. Performance is definitely something I can work towards later, but edge/node removal in Python is the real blocker here.

jermainewang commented 5 years ago

Could you expand on this a little more? I'm not sure what that would entail. I don't think it's necessary for the current plan, but I'll have to think about it.

Let's say we have a node feature vector X. After deleting some nodes, some rows of X will be deleted and become X'. We then use X' for some later computation, some loss values and backprop to get dX'. My question is do you wish the gradient to flow back to X as well?

Gunrock looks very interesting. That sort of gets back to my original question, does it make sense to separate the graph backend further so that different backends can be interacted with through the same DGL interface? Similar to how Boost exposes the same interface for multiple graph representations.

The reason why we have multiple tensor backends is to let user define UDFs in their favorite DL frameworks. Such platform-agnostic design is not cheap -- we need to define common tensor interfaces and make sure that DGL can run on all of them both correctly and efficiently. For the graph side, we don't need to be compatible with anything so keeping one graph backend makes everything easier. Also, since I imagine a future where we need to support many graph operations on GPU, TPU or other hardware, it would be hard to extend BGL for that.

If we look at how Tensorflow/MX/Torch is organized, they define their tensor interfaces which are backed by different kernel libraries (e.g. MKL, cudnn, cublas and their own kernels). Thus, to make an analogy, I agree that DGL should define a unified graph interface (e.g. DGLGraph, dgl.transform) that are backed by different graph operator libraries. The tricky thing is that BGL is not like cudnn. You still need to program it and the performance depends on how well you know the library and the optimization tricks. If such effort is to be paid, I would like to implement them directly in DGL.

We will try add support for node/edge removal. Will put it in the release roadmap #450 .

rayegun commented 5 years ago

My question is do you wish the gradient to flow back to X as well?

Yes, but it’s not necessary at this stage.

We will try add support for node/edge removal

Thanks! Do you have any idea on a timeline for C++ API/deletion? I understand those are big features, with likely limited use cases.

jermainewang commented 5 years ago

Hey @Wimmerer , I know it is quite a long delay. But node/edge removal APIs have been implemented! Checkout the document here. The feature will be included in v0.3 (very soon, final wrap up phase). Thus, if you could wait for a couple more days, you could use it in the updated package.

rayegun commented 5 years ago

Thanks! That was actually a very fast turn around!

jermainewang commented 5 years ago

v0.3 has been released. Please checkout and thanks for the discussion.