Cannot reproduce the performance on citeseer and polblogs

HappierTreeFriend commented 5 years ago

Hi Daniel,

I run your code on different datasets, but I just cannot reproduce the results in the paper.

The reported GCN misclassification rate on citeseer dataset (clean) in your paper is 28.5 ± 0.9, howover, in my implemention (I used pygcn ), the misclassification rate is about 25%.

I run your code to generate 5% perturbed graph and use it as input, then get 73.7% classification accuracy. So on citeseer dataset, with 5% perturbations, the performance of GCN model only drop 1-2%, not 6% as in the paper.

I don't know what is wrong, and I am wondering if there are some more parameters of the attacker I need to tune.

I would really appreciate it if you could help me with this.

danielzuegner commented 5 years ago

Hi,

thanks for creating this issue. I've investigated what might be going on. I think most of it boils down to the fact that pygcn does things slightly different than we (and sometimes the GCN paper; see my last point) do.

The difference in the 'clean' accuracy seem to be due to the fact that we do not perform feature normalization. When I turn off feature normalization the performance of pygcn is pretty close to what we get.

I believe the difference in the reduction of performance through the attack is due to pygcn using L2 regularization, which we do use not during our attack. After quickly hacking in a change to line 529 of meta_gradient_attack.py:

loss = tf.reduce_mean(loss_per_node) + 5e-4* (tf.nn.l2_loss(current_weights[0]) + \
    tf.nn.l2_loss(current_weights[1]) + tf.nn.l2_loss(current_biases[0]) + \
    tf.nn.l2_loss(current_biases[1]))

the performance of pygcn dropped by roughly 5 points, similar to our paper. Of course we have to make sure to use the same split that was used during the attack.

Additionally, the pygcn implementation appears to perform a different normalization of the adjacency matrix than was proposed in the original GCN paper. That is, instead of normalizing using D_tilde^(-1/2) A_tilde D_tilde^(-1/2) (Eq. (2) in the paper) they use D_tilde^(-1) A_tilde.

I hope this clarification helps -- please let me know if you have any further questions.

HappierTreeFriend commented 5 years ago

Thanks so much for your instant reply!!!

Following you advice, now I get similar results on citeseer and cora_ml dataset!

But I still cannot reproduce the results on Polblogs. Since it does not have node features, I treat every node feature as one hot vector, like you said in another issue. Then the 'clean' accuracy is about 96%, while on 'meta-self' perturbed dataset it get an accuracy of 87%, not roughly 78% as reported in the paper.

Did you use some other tricks to deal with the Polblogs dataset?

danielzuegner commented 5 years ago

Hi,

I did a deep-dive into the original implementation and results we used for the paper. It seems I have missed a few details when re-implementing everything from scratch for publication.

when re-training GCN dropout and regularization was missing from this repo.
for the inner training loop we were not using bias terms for the paper. I've disabled bias terms for the inner training (i.e. attack) loop with the option to turn it back on.

Then I've re-attacked both PolBlogs and Citeseer and got results very similar to what we report in the paper. Thank you very much for pointing this out, I hope you can get the desired results now!

HappierTreeFriend commented 5 years ago

Cool!!! You've been very helpful, and I really admire you for your earnest attitude!

danielzuegner / gnn-meta-attack

Cannot reproduce the performance on citeseer and polblogs #3