allenhaozhu / SSGC

Implementation for Simple Spectral Graph Convolution in ICLR 2021
Other
82 stars 17 forks source link

The SSGC computation is not exactly right #3

Open EdisonLeeeee opened 3 years ago

EdisonLeeeee commented 3 years ago

Hi, allenhaozhu.

I have read your paper and the code, but I found that the SSGC computation is not exactly right, where emb += features is an inplaced operation and it will change the value of features as emb = features. It is better to replace it with emb = emb + features . https://github.com/allenhaozhu/SSGC/blob/7ecaf2527c8c127a72c4d63b446b6097c74fcf72/utils.py#L93-L103

I did it and the accuracy on Cora was increased from 0.830 to 0.835, but it was decreased from 0.734 to 0.733 and 0.806 to 0.796 on Citeseer and Pubmed, respectively.

allenhaozhu commented 3 years ago

Thank you very much for your help! You have no idea what's a big favour for me. Let me answer some questions. Yes, this code has bugs about the inplace issue. Fortunately, I almost do not use this code in my experiments. I used: `def sgc_precompute(features, adj, degree, alpha): t = perf_counter() feature_ori = features feature_set = torch.zeros_like(features) for i in range(degree): features = torch.spmm(adj, features) feature_set += (1-alpha)features + alphafeature_ori

feature_set/=degree+1

feature_set /= degree
#build for PPNA like 0.15 for cora and citeseer

precompute_time = perf_counter()-t
return feature_set, precompute_time`

and I checked there is no problem with the inplace issue in the debug setting of Pycharm.

  1. About the performance, I check the history and found that PubMed is 80.40 and citeseer is 73.60 with this code.
  2. I have to frankly say that these kinds of methods like SGC and SSGC are susceptible to weights decay. We are developing a better-unsupervised method learning a linear projection for SGC or SSGC more efficiently without a sophicicated parameter searching.
  3. I will update the code after a few days.
  4. thanks for your attention to our jobs
EdisonLeeeee commented 3 years ago

Thanks for your kindly reply.

You are right, and I totally agree with you about "SGC and SSGC are susceptible to weights decay", which also includes learning rate.

Looking forward to your update on the code :).

allenhaozhu commented 3 years ago

Hi, allenhaozhu.

I have read your paper and the code, but I found that the SSGC computation is not exactly right, where emb += features is an inplaced operation and it will change the value of features as emb = features. It is better to replace it with emb = emb + features . https://github.com/allenhaozhu/SSGC/blob/7ecaf2527c8c127a72c4d63b446b6097c74fcf72/utils.py#L93-L103

I did it and the accuracy on Cora was increased from 0.830 to 0.835, but it was decreased from 0.734 to 0.733 and 0.806 to 0.796 on Citeseer and Pubmed, respectively.

btw, because this code only involved in the parameter analysis I redo the experiments and found cora 83.5 citeseer 74.0 pubmed 80.2.

jackd commented 2 years ago

Update: I see the following was discussed in #2 as well.

@allenhaozhu can you confirm which version of the code you used for the results in the paper? If we assume the += issue is resolved (i.e. we use emb = emb + update), isn't the implementation posted inconsistent with the paper? Specifically, features = (1-alpha) * torch.spmm(adj, features) will mean the summation term will have a (1 - \alpha)^k factor, where as the paper uses a constant (1 - \alpha). This is quite significant, as it means the difference between what's published and an approximate Personalized PageRank implementation, i.e. starting from the expression in the paper with this geometric factor instead of the constant

$x^\prime = (\frac{1}{K} \Sigma{k=1}^K (1 - \alpha)^k T^k x) + \alpha x$ $= (\frac{1}{K} \Sigma{k=0}^K (1 - \alpha)^k T^k x) - 1/K x + \alpha x$ # add k=0 term to summation and subtract $\approx \frac{1}{K} [I - (1 - \alpha)T]^{-1} x + (\alpha - 1/K)x$ $= \frac{1}{\alpha K} T{PPR} x + (\alpha - 1/K) x$ where $T{PPR}=\alpha[I - (1 - \alpha)T]^{-1}$ is the personalized PageRank transition matrix.

Given that $\alpha = 0.05$ and $K = 16$, $\alpha \approx 1/K$, so we're left with $x^\prime \approx T_{PPR} x$.

In my view, both the version that's currently in master (and posted in the comments above, not the OP - repeated below with formatting) are consistent with the paper.

def sgc_precompute(features, adj, degree, alpha):
    t = perf_counter()
    feature_ori = features
    feature_set = torch.zeros_like(features)
    for i in range(degree):
        features = torch.spmm(adj, features)
        feature_set += (1-alpha)*features + alpha*feature_ori
        # feature_set/=degree+1
    feature_set /= degree #build for PPNA like 0.15 for cora and citeseer
    precompute_time = perf_counter()-t
    return feature_set, precompute_time