Open cassidylaidlaw opened 2 years ago
Hi,
Thank you for your interest in our work and sharing your great findings!
We verify your findings to be true (please see our reimplementation of your experiments at the end). This shows the limitation of our method: it works well for the cheap attacks (e.g. 20 steps APGD attack) but has noticeable performance drop on the expensive ones (e.g., 100 step APGD attack). We are glad to see the weakness of our method being discovered and discussed here, because that is how the community gets pushed forward. Thank you again for sharing these great findings with us!
However, having this limitation doesn't mean our method fails. Our method is still the only one that has almost no drop in clean accuracy while achieving high adversarial robustness against the cheap (and also arguably more practical) adversarial attacks.
It really depends on the practical requirements when choosing which defense method to use.
On one hand, in some practical cases, the attacker is likely to have query limitations. "A limit on the number of queries can be a result of limits on other resources, such as a time limit if inference time is a bottleneck or a monetary limit if the attacker incurs a cost for each query" [1]. If one wants high clean accuracy and high robustness against the cheap/query-limited attacks (e.g. 20 steps APGD), then our method is a much better choice than previous methods. This is the problem that our method solves.
On the other hand, I totally agree that in some cases, the attacker can have unlimited budgets to query the model (e.g. 100 steps APGD), when we shouldn't "restrict the computational power of an adversary artificially". This is a more challenging scenario than the previous one. In this case, the model should achieve high accuracy and high robustness against both cheap and expensive attacks. Our method can't fulfill this dream, nor could other existing methods. This is still an open problem requiring further research.
[1] Black-box Adversarial Attacks with Limited Queries and Information.
To reimplement your findings:
Robustness against full AutoAttack: In the paper, we used the cheap version of AutoAttack with two out of four attacks in the full version, each with 20 steps. Our computational budget didn't support the full version AutoAttack since it is too time consuming. To verify your results, we conducted full AutoAttack on a small subset of the ImageNet validation set. The results consist with your finding: the robustness of our model on full AA is much lower than that on the cheap AA.
Robustness against 100 steps of APGD: We verify that your finding is true. Our ResNet50 has around 5% accuracy under 100 step APGD attack.
Since the authors have admitted this limitation, would it be notified somewhere (at least in this Github repository and the updated version of the arxiv paper) to the community?
Hello, I was evaluating your pretrained model and I was able to reproduce some results from the paper using your code, e.g., the 12% accuracy under APGD-CE with 20 steps.
However, when I evaluate with full AutoAttack (using the latest code at https://github.com/fra31/auto-attack) the accuracy drops to 1.3%:
The modified version of
test.py
I used to run this evaluation is here: https://pastebin.com/iv0DP5VhEven just increasing the number of iterations of APGD-CE to 100 seems to significantly decrease the robust accuracy:
Is my evaluation wrong? Otherwise, it seems like the conclusions of this paper don't hold up. Although the robustness of the NoFrost model is better against fewer iterations of APGD, "it is not meaningful to restrict the computational power of an adversary artificially (e.g., to fewer than several thousand attack iterations)" (Athalye, Carlini, & Wagner 2018).