amazon-science / normalizer-free-robust-training

Official implementation of "Removing Batch Normalization Boosts Adversarial Training" (ICML'22)
https://proceedings.mlr.press/v162/wang22ap/wang22ap.pdf
Apache License 2.0
19 stars 0 forks source link

AutoAttack reduces accuracy of NoFrost ResNet-50 to 1.3% #2

Open cassidylaidlaw opened 2 years ago

cassidylaidlaw commented 2 years ago

Hello, I was evaluating your pretrained model and I was able to reproduce some results from the paper using your code, e.g., the 12% accuracy under APGD-CE with 20 steps.

$ python test.py --mode apgd --steps 20 --eps 8 --gpu 2 --md nf_resnet50 --ckpt_path data --ckpt_version NoFrost_ResNet50 --drp data
...
[NoFrost_ResNet50] APGD-ce-20-0.0314: acc 0.1203 | time 1681.9198

However, when I evaluate with full AutoAttack (using the latest code at https://github.com/fra31/auto-attack) the accuracy drops to 1.3%:

$ python test.py --mode autoattack --eps 8 --gpu 0 --md nf_resnet50 --ckpt_path data --ckpt_version NoFrost_ResNet50 --drp data --tb 200
...
[NoFrost_ResNet50] AutoAttack-0.0314: acc 0.0128 | time 37996.2781

The modified version of test.py I used to run this evaluation is here: https://pastebin.com/iv0DP5Vh

Even just increasing the number of iterations of APGD-CE to 100 seems to significantly decrease the robust accuracy:

$ python test.py --mode apgd --steps 100 --eps 8 --gpu 2 --md nf_resnet50 --ckpt_path data --ckpt_version NoFrost_ResNet50 --drp data
...
[NoFrost_ResNet50] APGD-ce-100-0.0314: acc 0.0529 | time 7280.7155

Is my evaluation wrong? Otherwise, it seems like the conclusions of this paper don't hold up. Although the robustness of the NoFrost model is better against fewer iterations of APGD, "it is not meaningful to restrict the computational power of an adversary artificially (e.g., to fewer than several thousand attack iterations)" (Athalye, Carlini, & Wagner 2018).

htwang14 commented 2 years ago

Hi,

Thank you for your interest in our work and sharing your great findings!

We verify your findings to be true (please see our reimplementation of your experiments at the end). This shows the limitation of our method: it works well for the cheap attacks (e.g. 20 steps APGD attack) but has noticeable performance drop on the expensive ones (e.g., 100 step APGD attack). We are glad to see the weakness of our method being discovered and discussed here, because that is how the community gets pushed forward. Thank you again for sharing these great findings with us!

However, having this limitation doesn't mean our method fails. Our method is still the only one that has almost no drop in clean accuracy while achieving high adversarial robustness against the cheap (and also arguably more practical) adversarial attacks.

It really depends on the practical requirements when choosing which defense method to use.

On one hand, in some practical cases, the attacker is likely to have query limitations. "A limit on the number of queries can be a result of limits on other resources, such as a time limit if inference time is a bottleneck or a monetary limit if the attacker incurs a cost for each query" [1]. If one wants high clean accuracy and high robustness against the cheap/query-limited attacks (e.g. 20 steps APGD), then our method is a much better choice than previous methods. This is the problem that our method solves.

On the other hand, I totally agree that in some cases, the attacker can have unlimited budgets to query the model (e.g. 100 steps APGD), when we shouldn't "restrict the computational power of an adversary artificially". This is a more challenging scenario than the previous one. In this case, the model should achieve high accuracy and high robustness against both cheap and expensive attacks. Our method can't fulfill this dream, nor could other existing methods. This is still an open problem requiring further research.

[1] Black-box Adversarial Attacks with Limited Queries and Information.

To reimplement your findings:

ZhengyuZhao commented 2 years ago

Since the authors have admitted this limitation, would it be notified somewhere (at least in this Github repository and the updated version of the arxiv paper) to the community?