Question about GCN/GIN forward propagation

yofufufufu commented 1 year ago

Hi Yuke，I read your paper and code, of course excellent work and thanks for the repository. I have question about the cuda kernel. I can see you computing GCN forward propagation in CUDA like this: https://github.com/YukeWang96/OSDI21_AE/blob/5c4f561aa27228164fc8c35d2468ea2cd9dc29ea/GNNAdvisor/GNNConv/GNNAdvisor_kernel.cu#L389 https://github.com/YukeWang96/OSDI21_AE/blob/5c4f561aa27228164fc8c35d2468ea2cd9dc29ea/GNNAdvisor/GNNConv/GNNAdvisor_kernel.cu#L399-L406 In my opinion, 1/degree_norm_inv should be used according to GCN paper, am I right? And in GIN cuda kernel: https://github.com/YukeWang96/OSDI21_AE/blob/5c4f561aa27228164fc8c35d2468ea2cd9dc29ea/GNNAdvisor/GNNConv/GNNAdvisor_kernel.cu#L676-L689 I think src node embedding $h^{k-1}_v$ should be used in forward propagation according to GIN paper. However, I can only see you aggregate the neighbor $h^{k-1}_u$ Thank you!

YukeWang96 commented 1 year ago

Hi,

Thanks for your interest in our work. Here are my answers for your reference.

Q1: Based on my understanding of the DGL GCN example, they normally put the computation of inverse degree norm outside their GCN computation kernel, right after their data loading. https://github.com/dmlc/dgl/blob/c5e83757dea5b2ac1c3d1e1b03f47d47d43d1369/examples/pytorch/gcn/train.py#L98-L101. Therefore, carrying norm computation inside the kernel is an optional step. Considering that the degree norm can be computed only once and reused across different layers and training iterations.
Q2: Thanks for pointing this out. Here for demonstration, we consider the case where $\epsilon$ defaults to 0 and graphs have self-loop edges, which has been shown in the DGL GIN example https://github.com/dmlc/dgl/blob/master/examples/pytorch/gin/train.py#L147. https://docs.dgl.ai/en/0.8.x/generated/dgl.nn.pytorch.conv.GINConv.html#dgl.nn.pytorch.conv.GINConv

And for improving the generality, you can consider simply replacing this https://github.com/YukeWang96/OSDI21_AE/blob/5c4f561aa27228164fc8c35d2468ea2cd9dc29ea/GNNAdvisor/GNNConv/GNNAdvisor_kernel.cu#L672 to initialize the partial aggregation results with the targeted node embedding values.

yofufufufu commented 1 year ago

Thanks for your reply. I am new to this field, maybe I misunderstand some basic knowledge?

graphs have self-loop edges

And does that mean all dataset(including Citeseer,Cora...) you use have selfloop edges? If true ,I have no other question.

Thanks for your time again!

YukeWang96 commented 1 year ago

Thanks for your suggestions.

Q1, you can just simply rewrite the func in the map, whether it is norm or norm_inverse will not change the kernel performance, since at the kernel both will be mapped to a global memory load.
Q2. Citeseer, Cora.. dataset is used for demonstration, also for other datasets, adding the self-loop to the original edge lists can also be easily achieved.

yofufufufu commented 1 year ago

Thanks for your reply, I will try your suggestions. And I learn a lot from your work, thanks for the repository again!

YukeWang96 / GNNAdvisor_OSDI21