Problem of experiments results on Polblogs dataset

PatPathead commented 3 years ago

Hi, I find a problem with polblogs dataset. I cannot reproduce the experiment results fully under the same random seed. I test them on the metattack model under the perturbation rate 10%, but I find I cannot reproduce all the results consistently.

When I set random seed as 10, GCN: 0.8680981595092024 RGCN: 0.8629856850715747 ProGNN: 0.82719836400818 ProGNN-fs: 0.8384458077709612

ProGNN, ProGNN-fs are consistent with your paper.

When I set random seed 15, GCN: 0.7198364008179959 RGCN: 0.7157464212678937 ProGNN:0.7147239263803682 ProGNN-fs: 0.7157464212678937

GCN, RGCN are consistent with your paper.

The parameter setting of ProGNN on polblog dataset, and all code is based on DeepRobust args.epochs = 1200 args.gamma = 1 args.alpha = 5e-4 args.beta = 1.5 args.lambda_ = 0 args.lr = 5e-4

If I am not wrong, I suppose you run experiments on different random seeds. Could you help me check it when you available?

Thanks in advance!

ChandlerBang commented 3 years ago

Hi,

(1) For Pro-GNN on Polblogs, please see the script in polblogs_meta.sh.

(2) As for GCN, the variance of their performance should not be so large (71.9%-86.8% in your case). I guess you are not using the same data splits for attack and defense for random seed 10. The splits used for attack (metattack) and defense are supposed to be the same; otherwise the defense performance can be very high.

To address this issue, you can use the latest code in train.py by using setting='prognn' to make sure the data splits are the same. See more details here.

PatPathead commented 3 years ago

Thanks for your reply. Actually, I notice that in polblogs_meta.sh, you set the random seed as 10. However, you provide the random seed of the attacked Polblogs in DeepRobust is 15. I cut the picture as follows. I think this setting is consistent with my results.

If ProGNN with random seed 10 but run on the attacked graph of random seed 15, I think it will cause the above-mentioned problem.

PatPathead commented 3 years ago

I also re-run Metattack by random seed 15, I got the following results GCN 0.820040899795501, RGCN 0.8169734151329244 ProGNN 0.9243353783231085

I think it is consistent with my observation. I am not sure you provide the attacked polblogs that actually is poisoned with random seed 10.

Thanks!

ChandlerBang commented 3 years ago

Are you using the latest code? If you use the following code to load the data splits, the splits are always fixed and the same as the ones used in attack.

# data = Dataset(root='/tmp/', name=args.dataset, setting='nettack', seed=15)
data = Dataset(root='/tmp/', name=args.dataset, setting='prognn')

PatPathead commented 3 years ago

Hi, I have set it as your instruction

    data = Dataset(root='data/', name=args.dataset, setting='prognn')
    adj, features, labels = data.adj, data.features, data.labels
    idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
    idx_test = idx_test
    idx_unlabeled = np.union1d(idx_val, idx_test)
    idx_all = np.union1d(idx_train,idx_unlabeled)
    adj, features, labels = preprocess(adj, features, labels, preprocess_adj=False)

perturbed_data = PrePtbDataset(root='data/',name=args.dataset,attack_method='meta', ptb_rate=args.ptb_rate)
modified_adj = torch.FloatTensor(perturbed_data.adj.todense())
modified_features = features
print('download successfully!')

I run three times and the results are not too much different than that observed before

GCN:0.7208588957055214 RGCN: 0.7085889570552147, ProGNN: 0.7668711656441719,

GCN:0.7361963190184049, RGCN: 0.7147239263803682, ProGNN: 0.7269938650306749,

GCN:0.7137014314928425, RGCN:0.7075664621676893, ProGNN: 0.7269938650306749,

Also, the split that prognn provided is not under random seed 15, and I suppose it maybe not consistent with the attacked version....

Thank!

ChandlerBang commented 3 years ago

Hi, first I want to point out that

If we use data = Dataset(root='data/', name=args.dataset, setting='prognn') to load the data, the given random seed does not affect the loaded data splits.

I just ran several seeds and found GCN achieved accuracy of around 69% on 15% meta polblogs (ProGNN around 0.85). So I think you are still not loading the data splits correctly (check if your folder has the file polblogs_prognn_splits.json). I would suggest you first reinstall deeprobust:

git clone https://github.com/DSE-MSU/DeepRobust.git
cd DeepRobust
python setup.py install

Then create a new folder to clone the newest Pro-GNN

git clone https://github.com/ChandlerBang/Pro-GNN.git
cd Pro-GNN
sh scripts/meta/cora_meta.sh
sh scripts/meta/gcn.sh

Let me know if you have any other questions.

PatPathead commented 3 years ago

Thanks for your patience! I have solved this problem.

ChandlerBang / Pro-GNN

Problem of experiments results on Polblogs dataset #8