Closed vymao closed 3 years ago
Hi Victor, The differences in the results are due to two main factors. First, the experimental tasks in the two papers are different. Although DGCN uses digraph datasets, their inputs are still symmetric adjacency matrices, which is why baselines in DGCN are similar to the original results in undirected graphs. Our experiments in DiGCN, however, restrict the inputs to asymmetric adjacency matrices in order to measure their performance in digraphs. Please see Sec 5.1 for the experimental task and Sec 6.1 for analysis. Second, DGCN concatenates second-order in & out matrix to obtain neighbour features, which could lead to a significant increase in the num of edges and cause OOM problem. Because of this structure, DGCN can get a larger receptive field than a normal GCN, and then get a small boost.
DiGCN goes beyond DGCN not only for better performance under more stringent experimental conditions but also for better interpretability. DGCN uses simple symmetric matrices for first-order proximity in its Eq.(8) and does not explain why directed structural features can be obtained. This is the reason why we have improved a lot on DGCN and proposed DiGCN.
Thanks. I am also wondering:
DIGCNConv.py
) relates to Equation 6 in the paper. Are you just pre-computing the k-th order proximity matrices? If so, where is the rest of the digraph convolution?get_adj.py
. DIGCNConv.py
only implements aggregation, which is the same operation as GCNConv
. Since Inception is a mature structure, for the sake of convenience, we only release the code for k=2 and three-layer DiGCN model. Calculating the second-order proximation matrix is in get_adj.py
, and the convolution operation is in DiGCN_ib.py
. You can easily increase the k and the number of layers against the equation.Thanks, some follow ups:
InceptionBlock
, why is there a Linear layer, and why are there two DiGCN layers?It would also be enormously helpful if you could document how to properly set up other datasets to run this model on, what to run, etc. I have some that I would be interested in testing (apart from the ones in the paper).
Thanks for your suggestion, since our implementation is based on PyTorch Geometric, any data suitable for PG dataloader can be loaded. For details, please refer to the PG documentation. Meanwhile, I will update the README file to give instructions.
I am familiar with the graph convolution paper, but there they motivated it through the graph Fourier transform (eigenvector change of basis, then convolution, then inverse graph Fourier transform, etc.). I'm not sure how the convolution here relates? It seems like Formula 4 of Section 2.3 is implemented for the first order proximity; is the convolution for k >= 2 an approximation of Formula 4?
Also, how should we set the teleport alpha for PageRank a hyperparameter? I will be working with strongly connected graphs, which it seems is noted in the paper that this method isn't ideal for, but I'm also wondering why.
And finally, do you minibatch the training? I didn't see this implemented and I am surprised that the training data can fit into GPU memory.
In my graph, the strongly connectedness comes from the fact that there exists a directed cycle for any path length, not necessarily just between two nodes, and there are no isolated nodes.
With regards to alpha, there is this statement in the paper:
Likewise, another work [26] also employs this idea to solve digraph problem. However, it is defined on the strongly connected digraphs, which is not universally applicable to any digraphs. Our method can easily generalize to it by α → 0.
Unless I am mistaken, did you mean to say that strongly connected graphs should have smaller alpha? I would still consider using DiGCN (but with particular choice for alpha) for a strongly connected graph I have. For one, it does not seem like the original spectral digraph convolution has been implemented in code. And for two, it seems that the kth-order proximity metric is useful here.
Also, I have a Pytorch Geometric dataset saved in individual Data files (from which the out-of-memory DataLoader loads it into minibatches). Would you be able to incorporate this? As it stands, it seems that one needs a compressed Numpy file.
I know that, but I am still considering DiGCN for the following reasons:
Is this valid?
Yes, exactly. You can take alpha=0.
Ok thank you. Also, do you add self-edges into the adjacency matrix before you apply the layers?
Yes, we add self-loops to guarantee the graph is aperiodic.
Also, isn't Cluster-GCN only available for undirected graphs?
Yes, Cluster-GCN is for undirected graphs. However, our digraph Laplacian is symmetric, which means you can apply Cluster-GCN in the Laplacian.
Hi,
I was wondering, what is the large discrepancy in accuracy for benchmarking methods (like GCN/GAT, for example) between Table 1 in the DGCN paper and Table 2 in the DiGCN paper? The accuracy levels seem very different for the same dataset.