Including example implementation of Simple Graph Convolution

dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.

http://dgl.ai

Apache License 2.0

13.45k stars 3.01k forks source link

Including example implementation of Simple Graph Convolution #398

Closed Tiiiger closed 5 years ago

Tiiiger commented 5 years ago

I'd like to first thank the authors for maintaining such a general framework. I believe this project can really accelerate the development of graph learning.

In our recent paper (https://arxiv.org/pdf/1902.07153.pdf), we discover a linear model (Simple Graph Convolution, SGC) that is as effective as GCN while saving a lot of training time. We have our official implementation at https://github.com/Tiiiger/SGC. I would like to ask if there will be interests of including SGC as an example in DGL. I am also happy to help with the implementation.

One benefit of SGC is perhaps that the propagation (or message passing) step can be reduced to a preprocessing step. I am not sure what will be a clean way to support this using DGL's APIs. Can you provide some pointers on how to start?

Finally, on this special date, 元宵快乐!

mufeili commented 5 years ago

@Tiiiger The results of your paper are really impressive. Thank you for introducing your paper. I believe this is worth a highlight and we'll go back to you soon.

zheng-da commented 5 years ago

If you can preprocess the message passing step, it sounds to me that you can put the preprocessed result as node data and call apply node function.

yzh119 commented 5 years ago

Hi, I've made a rough implementation of SGC in dgl: https://gist.github.com/yzh119/7c574606b8795369242ba2679cd2fa97 @Tiiiger , @mufeili could you please check if this is equivalent to the original paper? To run the experiment on Cora, run the following:

python sgc-dgl.py --dataset cora --gpu 0

Result on Cora: test acc is 81.86±0.08 (run 5 times and take the average)

Tiiiger commented 5 years ago

@yzh119 The model performance looks correct to me.

However, due to my unfamiliarity with the API, can you help clarify how is the message passing precomputed? Or is it still computed every time during forward()?

yzh119 commented 5 years ago

In my implementation, I've not pre-computed S^K, but to iteratively applies S on X\Theta. More specifically, we implement

in L36-L46:

# h = X \Theta
h = torch.mm(h, self.weight)
for _ in range(self.K):
    # normalization by square root of src degree, h = \tilde{D}^{-1/2} h
    h = h * self.g.ndata['norm']
    self.g.ndata['h'] = h
    # message passing, h = \tilde{A} h
    self.g.update_all(fn.copy_src(src='h', out='m'),
                    fn.sum(msg='m', out='h'))
    h = self.g.ndata.pop('h')
    # normalization by square root of dst degree, h = \tilde{D}^{-1/2} h
    h = h * self.g.ndata['norm']

This might not be as efficient as precompute S, but it's still very fast since your layer is computationally efficient. I hope this helps you.

More results:

pubmed test acc: 78.74(0.08), trained for 300 epochs, 0.0262s/epoch
citeseer test acc: 71.12(0.04), trained for 300 epochs, 0.1252s/epoch

I've not tuned other hyperparameters for these tasks.

Tiiiger commented 5 years ago

I see, sounds great! Although for large graphs (like the Reddit dataset), the precomputation can really make a difference. So I think it would be nice to showcase how to to do this.

I think one way to support this feature is that when initializing the SGC object, we can precompute the propagation and store the result as a cached variable. Another way, as @zheng-da mentioned, is to store the precomputed result in the dataset (the node data).

Which one is cleaner?

Tiiiger commented 5 years ago

I went ahead to implement the cached variable version. The gist is here: https://gist.github.com/Tiiiger/1ebe88ca0844f947f43e6ed4e1f088bd

The example gist should run much faster on citeseer and cora because during training, you only need to compute the training examples. For Citeseer and Pubmed, use weight_decay 5e-5 and just train for 100 epochs should get you a decent result. If possible, @yzh119 can you help benchmark the gist I shared?

yzh119 commented 5 years ago

@Tiiiger Results (on NVIDIA V100):

cora test acc: 82.10% (weight_decay 5e-6), 0.0008s/epoch
citeseer test acc: 71.20% (weight_decay 5e-5), 0.0008s/epoch
pubmed test acc: 79.10% (weight_decay 5e-5), 0.0007s/epoch

Tiiiger commented 5 years ago

looks terrific. Shall we merge this into the repo?

Thank you for your help!

yzh119 commented 5 years ago

Of course, we are glad to merge this into DGL examples, though I was wondering if there were more elegant ways to implement SGC in DGL. You are encouraged to create a pull request with the title [Model] Simplifying Graph Convolutional Networks to add this model to our model zoo(please attach a README, ref: https://github.com/dmlc/dgl/tree/master/examples/pytorch/gcn), then our team members would review the PR and make suggestions to improve the code. Your paper is solid and impressive, we hope to see more interesting work from you and your team in the future. It's best if you could use/contribute DGL and keep in touch with our team so that we could follow your work.

yzh119 commented 5 years ago

Merged in #405