glin2022 / atop

The code of the ICLR 2024 paper: Adversarial Training on Purification (AToP): Advancing Both Robustness and Generalization
8 stars 0 forks source link

Slow improvement in clean accuracy and robust accuracy #3

Open erhul opened 2 weeks ago

erhul commented 2 weeks ago

1731287062152

As shown in the training screenshot of cifar-miss-gau-adv.py (I converted cifar-miss-gau-adv.ipynb into a .py file), after 69 epochs of training, the robust accuracy under clean samples and various attacks has only increased from the initial 13% to about 46%, and the increase is unstable. Therefore, there is a considerable gap with the 90% in the original paper. Why is this?

In addition, the red dots in the red box appear in the purified image generated in the right picture. Is this normal?

glin2022 commented 2 weeks ago
image
  1. The image above shows the output from cifar-miss-gau-adv.ipynb. All accuracies have reached over 80%, and the situation you described did not occur.
  2. In previous experiments, we observed similar cases but not red dots. In addition, for the generator-based AP methods, the purifier model $g$ is not required to satisfy $g(x+\delta)=x$. In general, we only focus the classification accuracy.
    So, if the quality of the generated images is poor, it is normal. However, because your accuracy is low, I cannot determine whether these red dots are normal.

P.S. Recently, I have been preparing a submission for CVPR and preparing a rebuttal for another conference in the next several weeks. For some general questions, I will reply as soon as possible. For others, I may check later.

erhul commented 2 weeks ago

Can you tell me what the accuracy is for each attack in the first epoch? Why is the accuracy always around 12~14% after the first epoch when I use the pre-trained weights you provided? I don't know what went wrong?

glin2022 commented 1 week ago

The output log information of the first 10 epochs:

Files already downloaded and verified
Files already downloaded and verified
@iter: 0: 32.0716 it/s
d_loss: 1.0004
g_loss: 0.4020
ae_loss: 0.4109
ae_loss1: 0.1552
ae_loss2: 0.2558
cls_loss: 2.8541
@iter: 100: 1.3900 it/s
d_loss: 0.9992
g_loss: 0.0541
ae_loss: 0.1281
ae_loss1: 0.0582
ae_loss2: 0.0699
cls_loss: 1.6255
Saved state dicts!
@iter: 200: 1.4268 it/s
d_loss: 0.9903
g_loss: 0.0994
ae_loss: 0.0950
ae_loss1: 0.0461
ae_loss2: 0.0489
cls_loss: 1.0380
Saved state dicts!
@iter: 300: 1.4271 it/s
d_loss: 0.9658
g_loss: 0.2570
ae_loss: 0.0918
ae_loss1: 0.0447
ae_loss2: 0.0471
cls_loss: 0.7819
Saved state dicts!
@iter: 400: 1.4124 it/s
d_loss: 0.9270
g_loss: -0.0245
ae_loss: 0.0888
ae_loss1: 0.0433
ae_loss2: 0.0456
cls_loss: 0.8451
Saved state dicts!
@iter: 500: 1.3872 it/s
d_loss: 0.9264
g_loss: 0.4548
ae_loss: 0.0871
ae_loss1: 0.0425
ae_loss2: 0.0446
cls_loss: 0.7750
Saved state dicts!
@iter: 600: 1.2979 it/s
d_loss: 0.9246
g_loss: 0.4399
ae_loss: 0.0874
ae_loss1: 0.0428
ae_loss2: 0.0445
cls_loss: 0.7078
Saved state dicts!
test acc on clean examples (%): 82.81
test acc on FGSM examples (%):  64.06
test acc on BIM examples (%):   70.12
test acc on EoT examples (%):   71.88
test acc on BPDA examples (%):  70.51
________epoch 1________
Saved state dicts!
@iter: 700: 0.4852 it/s
d_loss: 0.8984
g_loss: -0.0743
ae_loss: 0.0849
ae_loss1: 0.0418
ae_loss2: 0.0432
cls_loss: 0.7223
Saved state dicts!
@iter: 800: 1.3672 it/s
d_loss: 0.8681
g_loss: 0.0358
ae_loss: 0.0852
ae_loss1: 0.0418
ae_loss2: 0.0433
cls_loss: 0.6993
Saved state dicts!
@iter: 900: 1.4071 it/s
d_loss: 0.8637
g_loss: 0.1087
ae_loss: 0.0855
ae_loss1: 0.0422
ae_loss2: 0.0433
cls_loss: 0.7311
Saved state dicts!
@iter: 1000: 1.4436 it/s
d_loss: 0.8759
g_loss: 0.1482
ae_loss: 0.0829
ae_loss1: 0.0409
ae_loss2: 0.0421
cls_loss: 0.7511
Saved state dicts!
@iter: 1100: 1.4162 it/s
d_loss: 0.8711
g_loss: 0.1445
ae_loss: 0.0828
ae_loss1: 0.0409
ae_loss2: 0.0419
cls_loss: 0.7137
Saved state dicts!
@iter: 1200: 1.4503 it/s
d_loss: 0.8674
g_loss: 0.1524
ae_loss: 0.0842
ae_loss1: 0.0416
ae_loss2: 0.0427
cls_loss: 0.7172
Saved state dicts!
test acc on clean examples (%): 87.50
test acc on FGSM examples (%):  67.38
test acc on BIM examples (%):   74.41
test acc on EoT examples (%):   77.73
test acc on BPDA examples (%):  77.54
________epoch 2________
Saved state dicts!
@iter: 1300: 0.4883 it/s
d_loss: 0.8523
g_loss: 0.1394
ae_loss: 0.0944
ae_loss1: 0.0444
ae_loss2: 0.0500
cls_loss: 0.5795
Saved state dicts!
@iter: 1400: 1.1813 it/s
d_loss: 0.8464
g_loss: 0.1768
ae_loss: 0.0946
ae_loss1: 0.0440
ae_loss2: 0.0506
cls_loss: 0.3302
Saved state dicts!
@iter: 1500: 1.1141 it/s
d_loss: 0.8376
g_loss: 0.1459
ae_loss: 0.0921
ae_loss1: 0.0430
ae_loss2: 0.0491
cls_loss: 0.2260
Saved state dicts!
@iter: 1600: 1.1452 it/s
d_loss: 0.8564
g_loss: 0.1901
ae_loss: 0.0919
ae_loss1: 0.0432
ae_loss2: 0.0487
cls_loss: 0.2316
Saved state dicts!
@iter: 1700: 1.2116 it/s
d_loss: 0.7948
g_loss: 0.0158
ae_loss: 0.0923
ae_loss1: 0.0435
ae_loss2: 0.0488
cls_loss: 0.2646
Saved state dicts!
@iter: 1800: 1.2249 it/s
d_loss: 0.7881
g_loss: 0.0687
ae_loss: 0.0895
ae_loss1: 0.0424
ae_loss2: 0.0471
cls_loss: 0.2161
Saved state dicts!
test acc on clean examples (%): 83.20
test acc on FGSM examples (%):  82.62
test acc on BIM examples (%):   79.88
test acc on EoT examples (%):   80.08
test acc on BPDA examples (%):  80.86
________epoch 3________
Saved state dicts!
@iter: 1900: 0.4630 it/s
d_loss: 0.8019
g_loss: 0.0801
ae_loss: 0.0896
ae_loss1: 0.0420
ae_loss2: 0.0476
cls_loss: 0.2282
Saved state dicts!
@iter: 2000: 1.2021 it/s
d_loss: 0.7892
g_loss: 0.0651
ae_loss: 0.0897
ae_loss1: 0.0422
ae_loss2: 0.0475
cls_loss: 0.2482
Saved state dicts!
@iter: 2100: 1.1836 it/s
d_loss: 0.7889
g_loss: 0.0782
ae_loss: 0.0885
ae_loss1: 0.0418
ae_loss2: 0.0467
cls_loss: 0.2533
Saved state dicts!
@iter: 2200: 1.2072 it/s
d_loss: 0.7977
g_loss: 0.0595
ae_loss: 0.0870
ae_loss1: 0.0413
ae_loss2: 0.0456
cls_loss: 0.1638
Saved state dicts!
@iter: 2300: 1.1992 it/s
d_loss: 0.7942
g_loss: 0.0652
ae_loss: 0.0883
ae_loss1: 0.0420
ae_loss2: 0.0463
cls_loss: 0.1804
Saved state dicts!
@iter: 2400: 1.2182 it/s
d_loss: 0.7910
g_loss: 0.0677
ae_loss: 0.0894
ae_loss1: 0.0423
ae_loss2: 0.0471
cls_loss: 0.2003
Saved state dicts!
test acc on clean examples (%): 87.30
test acc on FGSM examples (%):  88.09
test acc on BIM examples (%):   84.96
test acc on EoT examples (%):   85.16
test acc on BPDA examples (%):  84.38
________epoch 4________
Saved state dicts!
@iter: 2500: 0.4508 it/s
d_loss: 0.7891
g_loss: 0.0672
ae_loss: 0.0884
ae_loss1: 0.0420
ae_loss2: 0.0464
cls_loss: 0.2163
Saved state dicts!
@iter: 2600: 1.1080 it/s
d_loss: 0.7892
g_loss: 0.0568
ae_loss: 0.0884
ae_loss1: 0.0419
ae_loss2: 0.0464
cls_loss: 0.2023
Saved state dicts!
@iter: 2700: 1.0935 it/s
d_loss: 0.7944
g_loss: 0.0633
ae_loss: 0.0874
ae_loss1: 0.0417
ae_loss2: 0.0456
cls_loss: 0.1689
Saved state dicts!
@iter: 2800: 1.1477 it/s
d_loss: 0.7881
g_loss: 0.0593
ae_loss: 0.0883
ae_loss1: 0.0420
ae_loss2: 0.0463
cls_loss: 0.2461
Saved state dicts!
@iter: 2900: 1.0975 it/s
d_loss: 0.7827
g_loss: 0.0578
ae_loss: 0.0881
ae_loss1: 0.0420
ae_loss2: 0.0461
cls_loss: 0.2031
Saved state dicts!
@iter: 3000: 1.1653 it/s
d_loss: 0.7700
g_loss: 0.0429
ae_loss: 0.0872
ae_loss1: 0.0415
ae_loss2: 0.0457
cls_loss: 0.1788
Saved state dicts!
@iter: 3100: 1.1468 it/s
d_loss: 0.7795
g_loss: 0.0526
ae_loss: 0.0871
ae_loss1: 0.0416
ae_loss2: 0.0456
cls_loss: 0.1854
Saved state dicts!
test acc on clean examples (%): 89.06
test acc on FGSM examples (%):  88.09
test acc on BIM examples (%):   86.72
test acc on EoT examples (%):   87.11
test acc on BPDA examples (%):  85.94
________epoch 5________
Saved state dicts!
@iter: 3200: 0.4294 it/s
d_loss: 0.7854
g_loss: 0.0411
ae_loss: 0.0871
ae_loss1: 0.0414
ae_loss2: 0.0457
cls_loss: 0.2230
Saved state dicts!
@iter: 3300: 1.0997 it/s
d_loss: 0.7931
g_loss: 0.0448
ae_loss: 0.0886
ae_loss1: 0.0421
ae_loss2: 0.0465
cls_loss: 0.1894
Saved state dicts!
@iter: 3400: 1.1804 it/s
d_loss: 0.7972
g_loss: 0.0430
ae_loss: 0.0869
ae_loss1: 0.0414
ae_loss2: 0.0455
cls_loss: 0.1820
Saved state dicts!
@iter: 3500: 1.1242 it/s
d_loss: 0.7968
g_loss: 0.0418
ae_loss: 0.0868
ae_loss1: 0.0414
ae_loss2: 0.0454
cls_loss: 0.1743
Saved state dicts!
@iter: 3600: 1.1286 it/s
d_loss: 0.7959
g_loss: 0.0436
ae_loss: 0.0864
ae_loss1: 0.0412
ae_loss2: 0.0452
cls_loss: 0.1762
Saved state dicts!
@iter: 3700: 1.1068 it/s
d_loss: 0.7888
g_loss: 0.0466
ae_loss: 0.0868
ae_loss1: 0.0414
ae_loss2: 0.0454
cls_loss: 0.1822
Saved state dicts!
test acc on clean examples (%): 85.94
test acc on FGSM examples (%):  86.33
test acc on BIM examples (%):   83.40
test acc on EoT examples (%):   82.42
test acc on BPDA examples (%):  81.45
________epoch 6________
Saved state dicts!
@iter: 3800: 0.4376 it/s
d_loss: 0.7770
g_loss: 0.0422
ae_loss: 0.0864
ae_loss1: 0.0413
ae_loss2: 0.0451
cls_loss: 0.1941
Saved state dicts!
@iter: 3900: 1.1380 it/s
d_loss: 0.7737
g_loss: 0.0464
ae_loss: 0.0864
ae_loss1: 0.0412
ae_loss2: 0.0452
cls_loss: 0.1577
Saved state dicts!
@iter: 4000: 1.1662 it/s
d_loss: 0.7725
g_loss: 0.0418
ae_loss: 0.0884
ae_loss1: 0.0424
ae_loss2: 0.0460
cls_loss: 0.1821
Saved state dicts!
@iter: 4100: 1.1568 it/s
d_loss: 0.7796
g_loss: 0.0356
ae_loss: 0.0858
ae_loss1: 0.0408
ae_loss2: 0.0449
cls_loss: 0.2048
Saved state dicts!
@iter: 4200: 1.1890 it/s
d_loss: 0.7625
g_loss: 0.0323
ae_loss: 0.0851
ae_loss1: 0.0405
ae_loss2: 0.0446
cls_loss: 0.1528
Saved state dicts!
@iter: 4300: 1.1308 it/s
d_loss: 0.7408
g_loss: 0.0262
ae_loss: 0.0868
ae_loss1: 0.0415
ae_loss2: 0.0453
cls_loss: 0.1700
Saved state dicts!
test acc on clean examples (%): 85.74
test acc on FGSM examples (%):  85.74
test acc on BIM examples (%):   83.20
test acc on EoT examples (%):   82.62
test acc on BPDA examples (%):  81.25
________epoch 7________
Saved state dicts!
@iter: 4400: 0.4671 it/s
d_loss: 0.7516
g_loss: 0.0182
ae_loss: 0.0862
ae_loss1: 0.0411
ae_loss2: 0.0451
cls_loss: 0.1870
Saved state dicts!
@iter: 4500: 1.1542 it/s
d_loss: 0.7335
g_loss: 0.0311
ae_loss: 0.0869
ae_loss1: 0.0416
ae_loss2: 0.0453
cls_loss: 0.1609
Saved state dicts!
@iter: 4600: 1.0770 it/s
d_loss: 0.7283
g_loss: 0.0386
ae_loss: 0.0863
ae_loss1: 0.0413
ae_loss2: 0.0450
cls_loss: 0.1717
Saved state dicts!
@iter: 4700: 1.0912 it/s
d_loss: 0.7392
g_loss: 0.0356
ae_loss: 0.0865
ae_loss1: 0.0415
ae_loss2: 0.0450
cls_loss: 0.1689
Saved state dicts!
@iter: 4800: 1.1709 it/s
d_loss: 0.7476
g_loss: 0.0315
ae_loss: 0.0853
ae_loss1: 0.0408
ae_loss2: 0.0445
cls_loss: 0.1588
Saved state dicts!
@iter: 4900: 1.1379 it/s
d_loss: 0.7425
g_loss: 0.0290
ae_loss: 0.0860
ae_loss1: 0.0412
ae_loss2: 0.0448
cls_loss: 0.1869
Saved state dicts!
test acc on clean examples (%): 86.72
test acc on FGSM examples (%):  85.94
test acc on BIM examples (%):   83.40
test acc on EoT examples (%):   83.01
test acc on BPDA examples (%):  82.62
________epoch 8________
Saved state dicts!
@iter: 5000: 0.4283 it/s
d_loss: 0.7289
g_loss: 0.0291
ae_loss: 0.0854
ae_loss1: 0.0408
ae_loss2: 0.0446
cls_loss: 0.1655
Saved state dicts!
Saved state dicts!
@iter: 5100: 1.1716 it/s
d_loss: 0.7312
g_loss: 0.0285
ae_loss: 0.0863
ae_loss1: 0.0414
ae_loss2: 0.0449
cls_loss: 0.1551
Saved state dicts!
@iter: 5200: 1.1167 it/s
d_loss: 0.7358
g_loss: 0.0246
ae_loss: 0.0856
ae_loss1: 0.0409
ae_loss2: 0.0447
cls_loss: 0.1544
Saved state dicts!
@iter: 5300: 1.1004 it/s
d_loss: 0.7500
g_loss: 0.0202
ae_loss: 0.0851
ae_loss1: 0.0404
ae_loss2: 0.0447
cls_loss: 0.1658
Saved state dicts!
@iter: 5400: 1.1502 it/s
d_loss: 0.7522
g_loss: 0.0173
ae_loss: 0.0865
ae_loss1: 0.0415
ae_loss2: 0.0450
cls_loss: 0.1701
Saved state dicts!
@iter: 5500: 1.1849 it/s
d_loss: 0.7563
g_loss: 0.0063
ae_loss: 0.0855
ae_loss1: 0.0410
ae_loss2: 0.0445
cls_loss: 0.1648
Saved state dicts!
@iter: 5600: 1.0909 it/s
d_loss: 0.7535
g_loss: 0.0084
ae_loss: 0.0852
ae_loss1: 0.0409
ae_loss2: 0.0443
cls_loss: 0.1590
Saved state dicts!
test acc on clean examples (%): 87.89
test acc on FGSM examples (%):  87.70
test acc on BIM examples (%):   86.13
test acc on EoT examples (%):   85.55
test acc on BPDA examples (%):  85.55
________epoch 9________
Saved state dicts!
@iter: 5700: 0.4361 it/s
d_loss: 0.7616
g_loss: -0.0041
ae_loss: 0.0857
ae_loss1: 0.0411
ae_loss2: 0.0446
cls_loss: 0.1550
Saved state dicts!
@iter: 5800: 1.2015 it/s
d_loss: 0.7524
g_loss: 0.0024
ae_loss: 0.0859
ae_loss1: 0.0411
ae_loss2: 0.0448
cls_loss: 0.1647
Saved state dicts!
@iter: 5900: 1.1765 it/s
d_loss: 0.7599
g_loss: -0.0065
ae_loss: 0.0846
ae_loss1: 0.0405
ae_loss2: 0.0441
cls_loss: 0.1413
Saved state dicts!
@iter: 6000: 1.1119 it/s
d_loss: 0.7588
g_loss: -0.0062
ae_loss: 0.0859
ae_loss1: 0.0407
ae_loss2: 0.0452
cls_loss: 0.1763
Saved state dicts!
@iter: 6100: 1.0813 it/s
d_loss: 0.7610
g_loss: -0.0071
ae_loss: 0.0863
ae_loss1: 0.0409
ae_loss2: 0.0454
cls_loss: 0.1979
Saved state dicts!
@iter: 6200: 1.1347 it/s
d_loss: 0.7542
g_loss: -0.0055
ae_loss: 0.0862
ae_loss1: 0.0411
ae_loss2: 0.0451
cls_loss: 0.1790
Saved state dicts!
test acc on clean examples (%): 84.57
test acc on FGSM examples (%):  85.16
test acc on BIM examples (%):   83.98
test acc on EoT examples (%):   83.20
test acc on BPDA examples (%):  82.81
________epoch 10________
Saved state dicts!

I have checked that the 'cifar-miss-gau-adv.ipynb' is the same as the original one. However, in the subsequent evaluation process, I further encapsulated some functions, which might be a potential reason for your issue. I will check it later.

Rokishii commented 1 week ago

I do encounter the same problem, plz re-check it.

Files already downloaded and verified Files already downloaded and verified @iter: 0: 61.1121 it/s d_loss: 1.0004 g_loss: 0.4091 ae_loss: 0.4179 ae_loss1: 0.1599 ae_loss2: 0.2580 cls_loss: 6.0966 @iter: 100: 2.7190 it/s d_loss: 0.9954 g_loss: 0.1163 ae_loss: 0.1613 ae_loss1: 0.0762 ae_loss2: 0.0851 cls_loss: 2.8091 Saved state dicts! @iter: 200: 3.0092 it/s d_loss: 0.9670 g_loss: 0.2625 ae_loss: 0.1371 ae_loss1: 0.0680 ae_loss2: 0.0692 cls_loss: 2.6137 Saved state dicts! @iter: 300: 2.8332 it/s d_loss: 0.9405 g_loss: 0.3706 ae_loss: 0.1351 ae_loss1: 0.0671 ae_loss2: 0.0680 cls_loss: 2.6406 Saved state dicts! @iter: 400: 3.4582 it/s d_loss: 0.8942 g_loss: 0.4934 ae_loss: 0.1328 ae_loss1: 0.0660 ae_loss2: 0.0669 cls_loss: 2.6651 Saved state dicts! @iter: 500: 2.7816 it/s d_loss: 0.8676 g_loss: 0.1083 ae_loss: 0.1313 ae_loss1: 0.0653 ae_loss2: 0.0660 cls_loss: 2.6810 Saved state dicts! @iter: 600: 2.6598 it/s d_loss: 0.8287 g_loss: 0.1345 ae_loss: 0.1317 ae_loss1: 0.0656 ae_loss2: 0.0661 cls_loss: 2.6508 Saved state dicts! test acc on clean examples (%): 14.84 test acc on FGSM examples (%): 12.89 test acc on BIM examples (%): 13.67 test acc on EoT examples (%): 14.06 test acc on PGD examples (%): 13.67 ____epoch 1____

Rokishii commented 1 week ago

Will you release you pretrained GAN model cuz the DeepFillv2 is trained based on Places2 and CelebA-HQ? ref: https://github.com/JiahuiYu/generative_inpainting/tree/v2.0.0

Elysia200207 commented 5 days ago

Same question.The checkpoint in https://github.com/JiahuiYu/generative_inpainting/tree/v2.0.0 can not be accessed. Also, I would like to confirm if it is the link below 1732093022585 image

Rokishii commented 5 days ago

Try this link. But it would't help with the Acc that the author claims. https://github.com/nipponjo/deepfillv2-pytorch

Elysia200207 commented 5 days ago

Yes, I cannot align the results you provided using the weights in README

Rokishii commented 5 days ago

According to the author from above, he will "check it later".

glin2022 commented 4 days ago

Hi all, I have identified the cause of the problem: the output of GAN range is [-1, 1], whereas the input of Cls range is [0, 1]. This mismatch led to the low accuracy issue.

You can replace all inputs ‘x’ of Classifier with ‘(x + 1) / 2.’:

    line 82: outputs_cls = self.res(adv_x2)
    outputs_cls = self.res((adv_x2+1)/2.)
    line 83: outputs_cls = self.res(x2)
    outputs_cls = self.res((x2+1)/2.)
    lines 331, 337, 343, 349, 355: adv_predicted4 = x2.clone().detach()
    adv_predicted4 = (x2.clone().detach()+1) / 2.

But I am sure that the results at that time were obtained with the current version of the code, and the complete training log has also been uploaded to GitHub. Therefore I think this might not be the main reason and I haven’t modified the code in the GitHub. In settings without attacks, it’s very easy to achieve a standard accuracy of over 90% when only using classification loss to fine-tune the GAN. However, a current bug causes the loss to fail to converge even when there are no attacks. The issue might have been caused by uploading the wrong ResNet checkpoint or something else.

Therefore, I temporarily remove this file and upload the checkpoints of the entire model (https://drive.google.com/drive/folders/17qP_PWL1VXTnrZoRWhaESjvcIqkggArg?usp=share_link). Since many of these are over one year, I am not entirely sure if they represent the best results from that time. For the six settings, I trained four types of checkpoints, where RT3 is just a resampled version of RT2, so they use the same checkpoint.