Closed inkawhich closed 5 years ago
Note that 32.7% refers to robust accuracy under the FGSM attack. FGSM is not a strong attack and thus not sufficient to properly evaluate the robust accuracy of the model. As you can see in Table 5, PGD can reduce the accuracy of the same model to 3.5%. At the same time, worst case black-box accuracy is 21.3% from Table 3. So white-box attacks are indeed more powerful than black-box attacks in this setting. The discrepancy is an artifact of FGSM being a weak attack.
I agree that I would expect PGD to be a more powerful attack and expect the accuracies to be lower than FGSM. However, regardless of FGSM's traits, I was not expecting to see that a transfer-based blackbox FGSM attack was more powerful than a whitebox FGSM attack in any setting, especially by ~10%.
I agree that this is an interesting observation. However, I would be hesitant about drawing conclusions based on comparing the relative performance of a weak white-box attack to a weak black-box attack.
closing for now, I will reopen if I have any revelations. Thanks.
I am working to recreate some of the results from your paper, specifically some cifar10 transfer results. I have noticed something in the tables that doesnt seem intuitive so I was wondering if you could comment on.
In Table 5 [Model=Wide-Natural, Adversary=FGSM] it appears the whitebox model accuracy while under attack is 32.7%. In Table 3 [Target = Wide-Natural, Source = Wide-Natural] the accuracy of the target model under FGSM attack is recorded as 21.3%. This is surprising to me because it means the black-box attack is more powerful than the whitebox attack which I have never observed before. Do you have any intuitions or explanations about this?
Thank you.