ChandlerBang / Pro-GNN

Implementation of the KDD 2020 paper "Graph Structure Learning for Robust Graph Neural Networks"
https://arxiv.org/abs/2005.10203
280 stars 45 forks source link

about the details of the nettack experiment #12

Open Gmrylbx opened 3 years ago

Gmrylbx commented 3 years ago

Hi! Thanks for sharing the code, i'd like to ask you about the details of the nettack experiment. I noticed in the paper that you only selected 10% of the target nodes as the test set when you conducted the nettack experiment on the pubmed dataset. So when i get pubmed dataset from deeprobust repository and set the parameter 'ptb_rate'=1.0, there will be 186 targeted nodes, i just need to sample 10% nodes i.e. 18 nodes as my test set, am i right?

ChandlerBang commented 3 years ago

Hi,

Thanks for your interest in our work. The 186 targeted nodes are already sampled so you don't need to sample from them again. According to our paper,

The nodes in test set with degree larger than 10 are set as target nodes. For Pubmed dataset, we only sample 10% of them`.

This means, we first obtain the test nodes with degree larger than 10 (there will be 1860 nodes), and then we sample 10% of them. Hence, the number of the target nodes is 186.

Gmrylbx commented 3 years ago

thanks a lot

Gmrylbx commented 3 years ago

Hi!

Could you release the code that generated the nettack attack? I want to use all the target nodes as the test set (The nodes in test set with degree larger than 10).

Thanks a lot!

ChandlerBang commented 3 years ago

Basically, we just sequentially attack those target nodes. I modified the example code as follows

from deeprobust.graph.defense import GCN
from deeprobust.graph.targeted_attack import Nettack
from deeprobust.graph.utils import *
from deeprobust.graph.data import Dataset

def attack_all():
    cnt = 0
    degrees = adj.sum(0).A1
    node_list = select_nodes() # obtain the nodes to be attacked
    num = len(node_list)
    print('=== Attacking %s nodes sequentially ===' % num)
    modified_adj = adj
    for target_node in node_list:
        n_perturbations = int(degrees[target_node])
        model = Nettack(surrogate, nnodes=modified_adj.shape[0], attack_structure=True, attack_features=False, device=device)
        model = model.to(device)
        model.attack(features, modified_adj, labels, target_node, n_perturbations, verbose=False)
        modified_adj = model.modified_adj

Feel free to let me know if you have further questions.

Gmrylbx commented 3 years ago

thanks a lot!

Gmrylbx commented 3 years ago

Hi!

I meet some problems when i use Nettack to attack polblogs dataset with n_perturbations=1, the code as follows

modified_adj = adj
    print('=== [Poisoning] Attacking %s nodes respectively ===' % len(node_list))
    for target_node in tqdm(node_list):
        model = Nettack(surrogate, nnodes=modified_adj.shape[0], attack_structure=True, attack_features=False, device=device)
        model = model.to(device)
        model.attack(features, modified_adj, labels, target_node, int(n_perturbations), verbose=False)
        modified_adj = model.modified_adj
        print(modified_adj.nnz)
    modified_adj = modified_adj.tocsr()

the origin graph has 33430 nnz(non zero elements), but after sequentially attack 443 nodes with n_perturbations=1, the modified_adj only has 33364 nnz, is that correct? why the edges in modified_adj less than origin adj?

ChandlerBang commented 3 years ago

Hi, I would suggest you check the changes made on the adjacency matrix for each iteration. It could happen that the attacker deleted some edges.

Gmrylbx commented 2 years ago

Basically, we just sequentially attack those target nodes. I modified the example code as follows

from deeprobust.graph.defense import GCN
from deeprobust.graph.targeted_attack import Nettack
from deeprobust.graph.utils import *
from deeprobust.graph.data import Dataset

def attack_all():
    cnt = 0
    degrees = adj.sum(0).A1
    node_list = select_nodes() # obtain the nodes to be attacked
    num = len(node_list)
    print('=== Attacking %s nodes sequentially ===' % num)
    modified_adj = adj
    for target_node in node_list:
        n_perturbations = int(degrees[target_node])
        model = Nettack(surrogate, nnodes=modified_adj.shape[0], attack_structure=True, attack_features=False, device=device)
        model = model.to(device)
        model.attack(features, modified_adj, labels, target_node, n_perturbations, verbose=False)
        modified_adj = model.modified_adj

Feel free to let me know if you have further questions.

When you compare the defense performance of different models under Nettack, does these models use the same data set?

I mean, I use GCN as a surrogate model to attack the graph structure, and then use other models to train on this modified graph. Is this correct?

I think different models should use themselves as surrogate models when testing defense performance. Is this the truth?

yx606 commented 2 years ago

Hi! Thanks for sharing the code, I'd like to ask you about the details of the datasets! for the Citeseer dataset, the edges of LCC in your article are 3668, but the edges of LCC in some other articles are 3757。 Why is the number different?

ChandlerBang commented 1 year ago

Hi! Thanks for sharing the code, I'd like to ask you about the details of the datasets! for the Citeseer dataset, the edges of LCC in your article are 3668, but the edges of LCC in some other articles are 3757。 Why is the number different?

Sorry for the late reply (I just noticed this message). I am not sure why the difference happens but according to my experiment the number should be 3668. I remember it should also be 3668 for Citeseer when checking the original code of nettack,

ChandlerBang commented 1 year ago

Basically, we just sequentially attack those target nodes. I modified the example code as follows

from deeprobust.graph.defense import GCN
from deeprobust.graph.targeted_attack import Nettack
from deeprobust.graph.utils import *
from deeprobust.graph.data import Dataset

def attack_all():
    cnt = 0
    degrees = adj.sum(0).A1
    node_list = select_nodes() # obtain the nodes to be attacked
    num = len(node_list)
    print('=== Attacking %s nodes sequentially ===' % num)
    modified_adj = adj
    for target_node in node_list:
        n_perturbations = int(degrees[target_node])
        model = Nettack(surrogate, nnodes=modified_adj.shape[0], attack_structure=True, attack_features=False, device=device)
        model = model.to(device)
        model.attack(features, modified_adj, labels, target_node, n_perturbations, verbose=False)
        modified_adj = model.modified_adj

Feel free to let me know if you have further questions.

When you compare the defense performance of different models under Nettack, does these models use the same data set?

I mean, I use GCN as a surrogate model to attack the graph structure, and then use other models to train on this modified graph. Is this correct?

I think different models should use themselves as surrogate models when testing defense performance. Is this the truth?

Sorry for the late reply (I just noticed this message). I simply used GCN as the surrogate model and generated the attacked graphs. All (defense) models used the same attacked graphs.