fra31 / auto-attack

Code relative to "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"
https://arxiv.org/abs/2003.01690
MIT License
642 stars 111 forks source link

Autoattack and APGDT #52

Open Yeez-lee opened 3 years ago

Yeez-lee commented 3 years ago

Hi, I am a primary learner and still feel little bit confused about the autoattack and target apgdt. (1) For the standard autoattack, do all images are attacked by 4 attacks respectively and calculate the average robust accuracy of 4 attacks after all? And in autoattack, are adversarial images for each attack saved respectively? (2) And for target apgdt, how the target label is found? I see that the number of target label is equal to the number of total class minus one for CIFAR-10. If I want to use it for the ImageNet, should I set n_target_classes = 999 or any number among 1-999? What's the principal for the target setting?

Looking forwards to your help! Thanks!

fra31 commented 3 years ago

Hi,

there are a few options, described here. In particular, run_standard_evaluation uses an attack on some point only if none of the previous methods has been successful, which is equivalent to taking the worst-case over the four attacks. If instead you use run_standard_evaluation_individual all attacks are run on the full input batch, and the results are returned separately (this is more time-consuming).

The target classes for APGD-T are chosen, for each point, as the most likely (with highest logits) ones except for the correct one. We use 9 as standard value since the commonly used datasets have at least 10 classes. We use the same also for ImageNet to keep a constant computational budget and since we observed it to be effective, but in principle any value in [1, 999] can be used. Also, we use targeted losses, in the context of untargeted attacks, since these provide more diverse and, when multiple restarts are available, stronger attacks.

Hope this helps, and let me know if you have further questions!

Yeez-lee commented 3 years ago

Thanks for the response! I'm clearer than before but still have some questions. (1) I know that when I use run_standard_evaluation_individual, 4 attacks' results will be saved separately. But for run_standard_evaluation, do you mean that only 4 attacks that successfully perturb the point together can represent the autoattack is successful? For example, given 100 clean images, untargeted APGD-CE (no restarts) successfully perturbs 80 ones, targeted APGD-DLR (9 target classes) successfully perturbs 70 ones, targeted FAB (9 target classes), successfully perturbs 65 ones, and Square Attack (5000 queries) successfully perturbs 60 ones. Suppose that we have 45 ones that are successfully perturbed by all 4 attacks together and 90 ones that are successfully perturbed by at least one of 4 attacks , does autoattack regard 45 ones as the final perturbed images or 90 ones? Or it is the another case that given 100 clean images, each one is firstly attacked by untargeted APGD-CE (no restarts) and if it succeeds, the last 3 attacks will not work but if it fails, targeted APGD-DLR (9 target classes) will attack the data and so on so forth. If all 4 attacks successively fail, then the data is robust under the robust model. And the robust accuracy is these failures (robust data under model) / total number. (2) I think APGDT uses only top-k ( k=9) labels to generate adversarial examples where k is 9 and I see the codes . https://github.com/fra31/auto-attack/blob/feb85002a8b0e994a78cba02302f576a77e7ea2b/autoattack/autopgd_pt.py#L414 So why do we need the for loop instead of directedly using the case that we find the best target in the top-9 labels? And finally is only the best case saved or are all 9 cases saved for current for loop? (3) In the codes, https://github.com/fra31/auto-attack/blob/feb85002a8b0e994a78cba02302f576a77e7ea2b/autoattack/autopgd_pt.py#L48 Why do we need to use the norm twice in Line 56 and 113? Normally, I just use the norm once to constrain the generated ones in the last few steps like the second one below. https://github.com/fra31/auto-attack/blob/feb85002a8b0e994a78cba02302f576a77e7ea2b/autoattack/autopgd_pt.py#L56 https://github.com/fra31/auto-attack/blob/feb85002a8b0e994a78cba02302f576a77e7ea2b/autoattack/autopgd_pt.py#L113

Thanks for your help again!

fra31 commented 3 years ago

We consider a point successfully misclassified if any of the four attacks finds an adversarial perturbation. In your example it would be 90 points. And it woks as you described that the attacks are run sequentially only on the points which haven't been successfully attacked by a previous one, and the robust accuracy is given as the percentage of points robust to all attacks.

About APGD-T, we use different target classes so that we have different losses as objective functions of the maximization scheme. Also in this case, only one adversarial image is saved, if found, for each input.

The first line you mentioned is for the generation of the random starting point, which should be in the feasible set.

Hope this helps!