AI-secure / Transferability-Reduced-Smooth-Ensemble

22 stars 8 forks source link

How to set superparameter for TRS #1

Open ZY123-GOOD opened 2 years ago

ZY123-GOOD commented 2 years ago

Hello, thank you for sharing your paper and codes. However, I have some questions. I ran the codes for training TRS but found the clean test accuracy is around 60%, which is much lower than vanilla training. Moreover, the transferability is not as good as shown in the paper. Maybe the file "utils/Empirical/arguments.py" does not exist.

YunDuanZhiNeng commented 2 years ago

Hello, thank you for sharing your paper and codes. I have encountered the same problem. Can the author share the training parameters or the pre-trained model?

Lucas110550 commented 2 years ago

Hi, sorry for the late reply. All the hyper-parameters can be found in our paper. Specifically, in our code we defined our arguments as:

For the CIFAR experiments, we have lambda_a = 40, lambda_b = 2.5, \delta_0 = 0.01 and \delta_m = 0.03. For MNIST, we have considered various hyper-paramter settings mentioned in our Appendix.

Feel free to ask me any other questions! thanks!

HuangZuShu commented 1 year ago

Hi, sorry for the late reply. All the hyper-parameters can be found in our paper. Specifically, in our code we defined our arguments as:

  • lambda_a = scale coeff, lambda_b = scale lamda
  • plus-adv = {"True": TRS training; "False": cos-l2 training}
  • adv_eps: the maximum eps for smoothness loss computation (\delta_M in our paper)
  • init_eps: the starting eps for smoothness loss computation (\delta_0 in our paper)

For the CIFAR experiments, we have lambda_a = 40, lambda_b = 2.5, \delta_0 = 0.01 and \delta_m = 0.03. For MNIST, we have considered various hyper-paramter settings mentioned in our Appendix.

Feel free to ask me any other questions! thanks!

Hi, Could you please provide a specific command to train and predict the model for us in cifar-10?

Directly use the script script_train.py you provided to generate training commands. The training results are not consistent with the paper. Is this a parameter setting problem?

Here, is a command example, but i found the result is different from the paper. Acc@1 only about 42%. CUDA_VISIBLE_DEVICES=4 python3 -u trs/train/train_trs.py cifar10 cifar_resnet20 --lr 0.001000 --coeff 2.00 --lamda 2.00 --scale 5.00 --init-eps 0.10 --adv-eps 0.20 --num-models 3 --plus-adv --epochs 200

Epoch: [40][270/391] Time 2.991 Data 0.002 Loss 0.8694 Acc@1 93.211 Acc@5 99.859 Epoch: [40][280/391] Time 2.991 Data 0.002 Loss 0.8704 Acc@1 93.186 Acc@5 99.861 Epoch: [40][290/391] Time 2.991 Data 0.002 Loss 0.8739 Acc@1 93.149 Acc@5 99.858 Epoch: [40][300/391] Time 2.992 Data 0.002 Loss 0.8763 Acc@1 93.127 Acc@5 99.860 Epoch: [40][310/391] Time 2.991 Data 0.002 Loss 0.8767 Acc@1 93.114 Acc@5 99.857 Epoch: [40][320/391] Time 2.991 Data 0.002 Loss 0.8755 Acc@1 93.112 Acc@5 99.854 Epoch: [40][330/391] Time 2.990 Data 0.002 Loss 0.8781 Acc@1 93.089 Acc@5 99.851 Epoch: [40][340/391] Time 2.990 Data 0.002 Loss 0.8760 Acc@1 93.115 Acc@5 99.851 Epoch: [40][350/391] Time 2.989 Data 0.002 Loss 0.8779 Acc@1 93.105 Acc@5 99.851 Epoch: [40][360/391] Time 2.989 Data 0.002 Loss 0.8800 Acc@1 93.070 Acc@5 99.855 Epoch: [40][370/391] Time 2.988 Data 0.002 Loss 0.8791 Acc@1 93.093 Acc@5 99.857 Epoch: [40][380/391] Time 2.987 Data 0.002 Loss 0.8812 Acc@1 93.059 Acc@5 99.861 Epoch: [40][390/391] Time 2.986 Data 0.002 Loss 0.8818 Acc@1 93.058 Acc@5 99.864 Test: [0/79] Time 0.211 Data 0.181 Loss 1.7234 Acc@1 45.312 Acc@5 82.031 Test: [10/79] Time 0.041 Data 0.017 Loss 1.7472 Acc@1 43.608 Acc@5 82.741 Test: [20/79] Time 0.033 Data 0.010 Loss 1.7625 Acc@1 42.708 Acc@5 82.329 Test: [30/79] Time 0.030 Data 0.007 Loss 1.7602 Acc@1 42.792 Acc@5 83.140 Test: [40/79] Time 0.029 Data 0.006 Loss 1.7647 Acc@1 42.969 Acc@5 83.098 Test: [50/79] Time 0.028 Data 0.005 Loss 1.7830 Acc@1 42.233 Acc@5 82.751 Test: [60/79] Time 0.028 Data 0.004 Loss 1.7765 Acc@1 42.098 Acc@5 82.915 Test: [70/79] Time 0.027 Data 0.004 Loss 1.7751 Acc@1 42.154 Acc@5 82.592

Lucas110550 commented 1 year ago

@xiaojunxu Hi Xiaojun, could you help check this? Thanks!

HuangZuShu commented 1 year ago

@xiaojunxu Hi Xiaojun, could you help check this? Thanks! For the CIFAR experiments, we have lambda_a = 40, lambda_b = 2.5, \delta_0 = 0.01 and \delta_m = 0.03.

Thanks! Maybe the hyper-parameter setting I used is wrong. i try it again with your given parameters.

What's more, could provide me the parameters you used for CIFAR-100. I find these setting you didn't write in your paper. Only mnist can find these parameters.

Or could you provide utils/Empirical/arguments.py, i think the parameters are in this file.

Thanks again!

xiaojunxu commented 1 year ago

@xiaojunxu Hi Xiaojun, could you help check this? Thanks! For the CIFAR experiments, we have lambda_a = 40, lambda_b = 2.5, \delta_0 = 0.01 and \delta_m = 0.03.

Thanks! Maybe the hyper-parameter setting I used is wrong. i try it again with your given parameters.

What's more, could provide me the parameters you used for CIFAR-100. I find these setting you didn't write in your paper. Only mnist can find these parameters.

Or could you provide utils/Empirical/arguments.py, i think the parameters are in this file.

Thanks again!

Hi, the hyper-parameters for CIFAR-100 are the same as those for CIFAR-10. Please let us know if the new set of parameters does not work. Thanks.

HuangZuShu commented 1 year ago

@xiaojunxu Hi Xiaojun, could you help check this? Thanks! For the CIFAR experiments, we have lambda_a = 40, lambda_b = 2.5, \delta_0 = 0.01 and \delta_m = 0.03.

Thanks! Maybe the hyper-parameter setting I used is wrong. i try it again with your given parameters. What's more, could provide me the parameters you used for CIFAR-100. I find these setting you didn't write in your paper. Only mnist can find these parameters. Or could you provide utils/Empirical/arguments.py, i think the parameters are in this file. Thanks again!

Hi, the hyper-parameters for CIFAR-100 are the same as those for CIFAR-10. Please let us know if the new set of parameters does not work. Thanks.

Hi,training with lambda_a = 40, lambda_b = 2.5, \delta_0 = 0.01 and \delta_m = 0.03 for CIFAR-10, while evaluating PGD attack at 87-th epoch , the result is lower than paper by a large margin.

Trainging command :

CUDA_VISIBLE_DEVICES=2,3 python train/Empirical/train_trs.py cifar10 cifar_resnet20 --num-models 3 --epochs 200 --scale 1.0 --coeff 40 --lamda 2.5 --plus-adv --adv-eps 0.03 --init-eps 0.01

The evaluation information as following:

{'dataset': 'cifar10', 'base_classifier': '/home/turing/data2/huangzs/work2022/Transferability-Reduced-Smooth-Ensemble/logs/Empirical/scratch/cifar10/trs/vanilla_40.0_2.5_1.0/0.01-0.03/checkpoint.pth.tar', 'attack_type': 'pgd', 'num_models': 3, 'adv_eps': 0.01, 'adv_steps': 50, 'random_start': 5, 'coeff': 0} Model loaded Files already downloaded and verified Phase 0 Phase 1 Phase 2 Phase 3 Phase 4 PGD (eps = 0.01): 88.73 (Before), 20.83 (After)

{'dataset': 'cifar10', 'base_classifier': '/home/turing/data2/huangzs/work2022/Transferability-Reduced-Smooth-Ensemble/logs/Empirical/scratch/cifar10/trs/vanilla_40.0_2.5_1.0/0.01-0.03/checkpoint.pth.tar', 'attack_type': 'pgd', 'num_models': 3, 'adv_eps': 0.02, 'adv_steps': 50, 'random_start': 5, 'coeff': 0} Model loaded Files already downloaded and verified Phase 0 Phase 1 Phase 2 Phase 3 Phase 4 PGD (eps = 0.02): 88.71 (Before), 1.95 (After)

{'dataset': 'cifar10', 'base_classifier': '/home/turing/data2/huangzs/work2022/Transferability-Reduced-Smooth-Ensemble/logs/Empirical/scratch/cifar10/trs/vanilla_40.0_2.5_1.0/0.01-0.03/checkpoint.pth.tar', 'attack_type': 'pgd', 'num_models': 3, 'adv_eps': 0.03, 'adv_steps': 50, 'random_start': 5, 'coeff': 0} Model loaded Files already downloaded and verified Phase 0 Phase 1 Phase 2 Phase 3 Phase 4

PGD (eps = 0.03): 88.04 (Before), 0.13 (After)

xiaojunxu commented 1 year ago

Thanks for pointing out this issue!

tl;dr: we are observing a similar result as yours and will update the repo. Before our update, if you want to get results similar to the paper, you can 1) tune parameters to increase --coeff and --lamda, and 2) initialize the model with vanilla-trained ones.

We observe a similar result while replicating the results. As we compare this open-sourced one and the original uncleaned version, we do observe two differences. First, originally we used a different way to aggregate cos_loss and smooth_loss with logsumexp, and for simplification, we find that a simple average can do the aggregation as well. However, this would require a larger coefficient over lambda1 and lambda2. Second, we initialized the TRS model with the vanilla-trained model, which is omitted in this open-sourced repo. We are going to update these settings so that the trained model has the same result as in our paper. You could tune it by yourself if you would like before we finish the tuning process. The key idea for tuning is that stronger regularization improves adv robustness at the cost of some benign accuracy. In our paper, the vanilla accuracy of TRS is 86.7% which is lower than what you trained, so it is expected that stronger lambdas could help with the adv performance.

HuangZuShu commented 1 year ago

we are observing a similar result as yours and will update the repo.

Thank you very much ! I try to initialize the model with vanilla-trained ones. And hope you update the repo as soon as possible.

HuangZuShu commented 1 year ago

Hi, @xiaojunxu
Q1: the whitbox accuracy in Table 1 of your paper is evaluated on the whole test-set ? Q2: the c the constant for CW and ead attack in Table 1 is coresponding to the parameter coeff in https://github.com/AI-secure/Transferability-Reduced-Smooth-Ensemble/blob/main/eval/Empirical/whitebox.py#L32 ?

xiaojunxu commented 1 year ago

Q1: the whitbox accuracy in Table 1 of your paper is evaluated on the whole test-set ?

Yes.

Q2: the c the constant for CW and ead attack in Table 1 is coresponding to the parameter coeff

Yes.

HuangZuShu commented 1 year ago

@xiaojunxu have you tune the parameters well for cifar10. I try it, and the accuracy is about 86%, but the robustness accuracy is lower than the paper at 90-th epoch. Training parameters and a part of evaluate result are as following:

{
    "arch": "ResNet",
    "depth": 20,
    "model_num": 3,
    "model_file": null,
    "gpu": "4",
    "seed": 0,
    "epochs": 200,
    "lr": 0.1,
    "sch_intervals": [
        100,
        150
    ],
    "lr_gamma": 0.1,
    "data_dir": "/home/turing/data2/huangzs/data",
    "batch_size": 128,
    "dataset": "cifar10",
    "num_classes": 10,
    "coeff": 100.0,
    "lamda": 2.5,
    "scale": 1.0,
    "plus_adv": true,
    "adv_eps": 0.03,
    "init_eps": 0.01
}

[restart 1] FGSM (eps = 0.02): 86.53(Before), 26.12(After) [restart 2] FGSM (eps = 0.02): 86.53(Before), 26.12(After) [restart 3] FGSM (eps = 0.02): 86.53(Before), 26.12(After) [restart 4] FGSM (eps = 0.02): 86.53(Before), 26.12(After) [restart 5] FGSM (eps = 0.02): 86.53(Before), 26.12(After) [restart 1] FGSM (eps = 0.04): 86.53(Before), 25.23(After) [restart 2] FGSM (eps = 0.04): 86.53(Before), 25.23(After) [restart 3] FGSM (eps = 0.04): 86.53(Before), 25.23(After) [restart 4] FGSM (eps = 0.04): 86.53(Before), 25.23(After) [restart 5] FGSM (eps = 0.04): 86.53(Before), 25.23(After) [restart 1] BIM (eps = 0.01): 86.53(Before), 6.62(After) [restart 2] BIM (eps = 0.01): 86.53(Before), 6.62(After)

HuangZuShu commented 1 year ago

We observe a similar result while replicating the results. As we compare this open-sourced one and the original uncleaned version, we do observe two differences. First, originally we used a different way to aggregate cos_loss and smooth_loss with logsumexp, and for simplification, we find that a simple average can do the aggregation as well. However, this would require a larger coefficient over lambda1 and lambda2. Second, we initialized the TRS model with the vanilla-trained model, which is omitted in this open-sourced repo. We are going to update these settings so that the trained model has the same result as in our paper. You could tune it by yourself if you would like before we finish the tuning process. The key idea for tuning is that stronger regularization improves adv robustness at the cost of some benign accuracy. In our paper, the vanilla accuracy of TRS is 86.7% which is lower than what you trained, so it is expected that stronger lambdas could help with the adv performance.

Hi, @Lucas110550 @xiaojunxu.

Q1: would you mind sending me the original uncleaned version ? Because i need to reproduce your result in your paper urgently. This is my email: 13729102801@163.com .

Q2: What is the parameter of APGD-CE and APGD-DLR when evaluate robustness ? Would you mind sharing the code and their parameters ?

Thank you very much.

xiaojunxu commented 1 year ago

We have sent the code via email. Please check the email for more details. Thanks.

HuangZuShu commented 1 year ago

@xiaojunxu @Lucas110550

Sorry to bother again. I really didn't find attachment in your email, and i don't know why.

Would you mind giving me the code about evaluating APGD-LR and APGD-CE attack in Table 12, and the clean accuracy of CIFAR-100 of your algorithm ?

It would be better, if you could create a new branch for your original code in this repo.

Really thank you!

xiaojunxu commented 1 year ago

Hi,

Please check you email setting for the attachment, as I see something like "attachment is filtered" in your replied email.

For APGD-LR and APGD-CE, we use the code in https://github.com/fra31/auto-attack/tree/master/autoattack for evaluation.

For CIFAR-100, our benign accuracy is 64.3.

HuangZuShu commented 1 year ago

Thanks for pointing out this issue!

tl;dr: we are observing a similar result as yours and will update the repo. Before our update, if you want to get results similar to the paper, you can 1) tune parameters to increase --coeff and --lamda, and 2) initialize the model with vanilla-trained ones.

We observe a similar result while replicating the results. As we compare this open-sourced one and the original uncleaned version, we do observe two differences. First, originally we used a different way to aggregate cos_loss and smooth_loss with logsumexp, and for simplification, we find that a simple average can do the aggregation as well. However, this would require a larger coefficient over lambda1 and lambda2. Second, we initialized the TRS model with the vanilla-trained model, which is omitted in this open-sourced repo. We are going to update these settings so that the trained model has the same result as in our paper. You could tune it by yourself if you would like before we finish the tuning process. The key idea for tuning is that stronger regularization improves adv robustness at the cost of some benign accuracy. In our paper, the vanilla accuracy of TRS is 86.7% which is lower than what you trained, so it is expected that stronger lambdas could help with the adv performance.

@xiaojunxu @Lucas110550 Sorry to bother again. I can't receive the attachment of original code of TRS through email in China. Could you share it with me in the form of a Google Drive link via my email 13729102801@163.com? I want to run TRS with other dataset to finish my master project, which is very important for me. Thank you very much.