Closed Ma-Ruinan closed 2 years ago
Hi,
Can you provide more details when you run those files test_pgd.py、test_random.py、test_DICE.py
(global attack)? They work fine from my side.
> python test_pgd.py
Loading citeseer dataset...
Downloading from https://raw.githubusercontent.com/danielzuegner/gnn-meta-attack/master/data/citeseer.npz to /tmp/citeseer.npz
Done!
=== testing GCN on clean graph ===
Test set results: loss= 1.2319 accuracy= 0.5770
=== testing GCN on Evasion attack ===
100%|█████████████████████████████████████████████████████████████████████████| 200/200 [00:31<00:00, 6.37it/s]
Test set results: loss= 1.2319 accuracy= 0.5770
=== testing GCN on Poisoning attack ===
Test set results: loss= 1.2830 accuracy= 0.5430
For test_nettack.py
, it is a targeted attack. The evaluation process is quite different different from the above ones. You should check the accuracy on the attacked nodes instead of all the test nodes.
Thanks for answering for me,I am learning global attacks methods,so we just talk about global attacks,such as,test_pgd.py、test_random.py、test_DICE.py. Now,I set test_pgd.py as an example: for test_pgd.py, the result I got is:
The experimental effect is not as good as in the paper:(Misclassification rates (%) under5%perturbed edges) As we can see in paper,under the same setting, the misclassification rates I got is about 3% lower than papers. Can you tell me the possible reasons?
If the results I got above is normal(maybe it is caused by different implement),I wonder know how do you think about the robustness of the GCN?There are 5278 edges、2708nodes(Maximum connectivity) in cora dataset, we perturbed 5% of the edges(about 264 edges) in global attack, but the accuracy of the node classification only decrease 7-10%, and even we perturbed 50% of the whole edges, we still can get 34.5% accuracy.(There are 7 classes in cora datasets, random choose a class for one node, the accuracy is about 14%)
Is this the powerful ability of semi-supervised learning in node classification tasks?
Q1. Performance discrepancy between the performances.
A1. The data splits used in the original paer are different in . Now I have updated test_pgd.py
to make them consistent.
$ python test_pgd_new.py --dataset cora --seed=0
=== testing GCN on clean graph ===
Test set results: loss= 0.7849 accuracy= 0.8130
=== setup attack model ===
100%|█████████████████████████████████████████████████████████████████████████| 100/100 [00:08<00:00, 11.28it/s]
=== testing GCN on Evasion attack ===
Test set results: loss= 1.0142 accuracy= 0.7340
=== testing GCN on Poisoning attack ===
Test set results: loss= 1.0695 accuracy= 0.7200
There is still some inconsistency between our updated performance and the reported performance in their paper. If you check the author's repo https://github.com/KaidiXu/GCN_ADV_Train, you will find that they use different early stopping strategy and patience, which can greatly impact the final performance. For other parts, we actually followed their code to implement the attack model, so it should be secure to use our PyTorch version.
Q2. There are 7 classes in cora datasets, random choose a class for one node, the accuracy is about 14%.
A2. While there are 7 classes in Cora, the number of samples in each class is different.
ipdb> from collections import Counter
ipdb> Counter(labels[data.idx_test])
Counter({3: 319, 4: 149, 2: 144, 0: 130, 5: 103, 1: 91, 6: 64})
ipdb> 319/len(data.idx_test)
0.319
As you can see, if we guess all the labels to be 3, we can still get a performance of 31.9%. And I just tested the new test_pgd.py
and I also obtain 31.9% when using ptb_rate=0.5
.
$ python test_pgd_new.py --dataset cora --ptb_rate=0.5
=== testing GCN on clean graph ===
Test set results: loss= 0.7677 accuracy= 0.8190
=== setup attack model ===
100%|█████████████████████████████████████████████████████████████████████████| 100/100 [00:08<00:00, 12.26it/s]
=== testing GCN on Evasion attack ===
Test set results: loss= 1.8434 accuracy= 0.4270
=== testing GCN on Poisoning attack ===
Test set results: loss= 1.8690 accuracy= 0.3190
Really thanks!There is indeed some problem in old test_pgd.py, now it is right! And I found different data partition caused by different random seed do have a great impact on the results!
I have another question for test_pgd.py after you updated it. previously: now: I know in the new version,fake_labels is predicted by target_model, so we still don't use test dataset‘s true label, but I wonder know when we set test-time attack, why we don't use true label for train dataset+test dataset‘s predicted labels? I wonder know whether test-time attack or train-time attack make modified_adj by a suggorate model. Sorry to disturb you again
Thank you for pointing that out. I want to note that the training accuracy is usually very high; so the resulted performance does not differ a lot.
ipdb> (labels[idx_train] == fake_labels[idx_train]).sum()
tensor(140)
But I agree that using "train dataset+test dataset‘s predicted labels" would make more sense. I just change the fake labels to the combination of real training labels and predicted test labels. See below. https://github.com/DSE-MSU/DeepRobust/blob/2bcde200a5969dae32cddece66206a52c87c43e8/examples/graph/test_pgd.py#L91-L99
Really thanks!
I found this might be a reason for the issue, in you test_min_max.py, when you accomplish min-max topology attack, in one epoch, you only do one step for inner maximization, then followed by outer minimization(PGD for test time attack), in paper, author said like this:
But there still have a question: author said PGD attack is easier than min-max, but in their result, min-max caused higher misclassification rate! This is contradictory to the result they showed
Hi,
(1) Yeah, that may be the point. We are a bit short-handed on testing the results for tuning the hyper-parameter of inner loops. It would be great if you can help us test it.
(2) When they say "easier", I think they mean that the optimization problem is easier to solve (or easier to get the optimal solution) as it does not involve the complex inner optimization. As a result, compared to min-max attacks, CE-PGD and CW-PGD yield better attacking performance since it is easier to attack a pre-defined GCN. However, min-max attacks work better when we need to attack a model that will be retrained. This is because min-max attack considers the inner optimization of the GNN model trained on the attacked data.
I understand,really thanks!
I run test_pgd.py、test_random.py、test_DICE.py as well as test.nettack.py, I set ptb_rate=0.2, but when I test classification accuracy on pertubed graph(modified_adj), the accuracy for these .py are all about 80%. I don't know why, can you help me?