Citations Hyperparameter Search

jackd commented 1 year ago

Hi, I'm a researcher looking at reproducing the results of the hyperparameter search and I'm struggling to get values similar to those in the paper. The implementations here have a few errors (I'm not sure if this is to do with pytorch version updates, but I don't think so...), so I've created PR #14 which makes things runnable. However, the results it generates are significantly worse than those published. Can you provide additional details about the hyperparameter search you conducted, or guidance on where I've gone wrong?

Thanks in advance.

allenhaozhu commented 1 year ago

Thanks for your attention. If you have experiences on SGC (simplifying graph convolution), you will find these graph convolution based methods highly depends on lr and weight decay. You can check PR#3, if there is any problem please let me know.

jackd commented 1 year ago

@allenhaozhu do you mean PR #9 ? If so I'm unsure how to interpret that - are you suggesting this provides good hyperparameters? Because I get even worse results than with my PR.

# hyperparameters from from https://github.com/CrawlScript/tf_geometric/blob/master/demo/demo_ssgc.py
python citation_cora.py --repeats 10 --weight_decay 2e-3 --lr 1e-3 --epochs 401 --dropout 0.5 --degree 10 --alpha 0.1
# test_acc = 0.8094 +- 0.0014
# val_acc  = 0.7872 +- 0.0016

Regardless, I am satisfied that there are hyperparameters that get good test classification rates. However, there are plenty of hyperparameter sets that give better validation accuracies than those provided in args_XXX, at least for cora and citeseer. Example results below.

python tuning_cora.py  # will create hyperparameter file
# Best weight decay: 4.29e-06
python citation_cora.py --repeats 10 --tune
# Overall
# test_acc = 0.8112 +- 0.0012
# val_acc  = 0.8114 +- 0.0009
python citation_cora.py --repeats 10
# Overall
# test_acc = 0.8257 +- 0.0005 <- not as good as reported, but could be considered close
# val_acc  = 0.8090 +- 0.0010 <- worse than that found in 60-run hyperparameter tune above

python tuning_citeseer.py  # will create hyperparameter file
# Best weight decay: 5.06e-05
python citation_citeseer.py --repeats 10 --tuned
# Overall
# test_acc = 0.7240 +- 0.0000
# val_acc  = 0.7460 +- 0.0000
python citation_citeseer.py --repeats 10
# Overall
# test_acc = 0.7300 +- 0.0000 <- not as good as reported, but could be considered close
# val_acc  = 0.7440 +- 0.0000 <- worse than that found in 60-run hyperparameter tune above

I've repeated the above hyperparameter search a number of times and results are consistent.

My questions are:

do you have the hyperparameters that generated the reported results? and
can you provide additional details on how these were found? There are issues with the tuning_XXX.py scripts that I have attempted to address in the PR, but maybe I got something wrong.

allenhaozhu commented 1 year ago

maybe I know your point. I do not use tuning_XXX.py that is for SGC. And your citation_cora is pretty low. It should be more than 0.83. Sorry for my terrible habits, let me check. And I mean in https://github.com/allenhaozhu/SSGC/issues/3, we discuss the issue.

allenhaozhu commented 1 year ago

def get_citation_args(): parser = argparse.ArgumentParser() parser.add_argument('--no-cuda', action='store_true', default=False, help='Disables CUDA training.') parser.add_argument('--seed', type=int, default=42, help='Random seed.') parser.add_argument('--epochs', type=int, default=120, help='Number of epochs to train.') parser.add_argument('--lr', type=float, default=0.2, help='Initial learning rate.') parser.add_argument('--alpha', type=float, default=0.15, help='alpha.') parser.add_argument('--weight_decay', type=float, default=1e-05, help='Weight decay (L2 loss on parameters).') parser.add_argument('--hidden', type=int, default=0, help='Number of hidden units.') parser.add_argument('--dropout', type=float, default=0, help='Dropout rate (1 - keep probability).') parser.add_argument('--dataset', type=str, default="cora", help='Dataset to use.') parser.add_argument('--model', type=str, default="SGC", choices=["SGC", "GCN"], help='model to use.') parser.add_argument('--feature', type=str, default="mul", choices=['mul', 'cat', 'adj'], help='feature-type') parser.add_argument('--normalization', type=str, default='AugNormAdj', choices=['NormAdj','AugNormAdj'], help='Normalization method for the adjacency matrix.') parser.add_argument('--degree', type=int, default=16, help='degree of the approximation.') parser.add_argument('--per', type=int, default=-1, help='Number of each nodes so as to balance.') parser.add_argument('--experiment', type=str, default="base-experiment", help='feature-type') parser.add_argument('--tuned', action='store_true', help='use tuned hyperparams')

args, _ = parser.parse_known_args()
args.cuda = not args.no_cuda and torch.cuda.is_available()
return args

In this setting, /home/allenzhu/Downloads/shaDow_GNN/venv/bin/python /home/allenzhu/Downloads/SSGC/citation_cora.py 0.0255s Validation Accuracy: 0.8060 Test Accuracy: 0.8350 Pre-compute time: 0.0255s, train time: 0.0581s, total: 0.0836s

Process finished with exit code 0

allenhaozhu commented 1 year ago

I have not habit to push the performance to the probably highest. I always start to write my paper when I found it is good enough.

jackd commented 1 year ago

That has $\alpha=0.15$ - different to the paper, but I'm fine with that part - but if we repeat the hyperparameter optimization I still find weight_decay parameters that give a better validation accuracy and poorer test accuracy.

The above hyperparameter settings:

python citation_cora.py --repeats 10 --alpha 0.15 --weight_decay 1e-5 --epochs 120
# Overall
# test_acc = 0.8350 +- 0.0000
# val_acc  = 0.8064 +- 0.0008

Rerunning hyperparameter search first:

python tuning_cora.py --alpha 0.15 --epochs 120
# Best weight decay: 5.39e-06
python citation_cora.py --repeats 10 --tuned --alpha 0.15
# Overall
# test_acc = 0.8262 +- 0.0004
# val_acc  = 0.8102 +- 0.0017

jackd commented 1 year ago

Note that using default hyperparameters from master and the bugged implementation (with the $(1 - \alpha)^k$ factor) I get values much closer to those reported:

# bugged sgc_precompute implementation
python citation_cora.py --repeats 10 
# Overall
# test_acc = 0.8349 +- 0.0003
# val_acc  = 0.8060 +- 0.0000

python citation_citeseer.py --repeats 10
# Overall
# test_acc = 0.7330 +- 0.0000
# val_acc  = 0.7524 +- 0.0008

# correct sgc_precompute implementation (currently in master)
python citation_cora.py --repeats 10 
# Overall
# test_acc = 0.8279 +- 0.0007
# val_acc  = 0.8046 +- 0.0018

python citation_citeseer.py --repeats 10
# Overall
# test_acc = 0.7300 +- 0.0000
# val_acc  = 0.7440 +- 0.0000

I've also had a look at the git history and it seems the fix wasn't applied until after the conference - though I realize code used to generate results for a paper often isn't the same as the code released alongisde the paper :).

jackd commented 1 year ago

I do not use tuning_XXX.py that is for SGC

My apologies, I missed this comment. The section I recall reading about hyperopt is indeed from SGC.

In that case, can you clarify how you chose your weight_decay parameters? Are these tuned somehow to optimize performance on the test set?

allenhaozhu commented 1 year ago

This weight_decay is default in SGC. But sometime I will test fews in different scales like 1e-3,-4,-5 because you know the variance of features magnitudes is pretty big and then weight decay will influence the result significantly. And btw about the issue#3, the proposer pointed out a potential bug in our original code (torch deep copy). We have fixed that, although it will slightly affect the performance. But the bug is only in cora,pubmed,citeseer, for other experiments the code is ok.

jackd commented 1 year ago

sometime I will test fews in different scales like 1e-3,-4,-5

I'm fine with the idea of a manual hyperparameter search. My question is: once you have validation and test accuracies for a variety of different configurations, how do you decide which to report?

allenhaozhu commented 1 year ago

I would like to report the highest test accuracy.

allenhaozhu / SSGC

Citations Hyperparameter Search #15