fra31 / auto-attack

Code relative to "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"
https://arxiv.org/abs/2003.01690
MIT License
639 stars 111 forks source link

4 classes model; targeted attacks are different #79

Open jS5t3r opened 2 years ago

jS5t3r commented 2 years ago

Follow up from https://github.com/fra31/auto-attack/issues/76

Since I have a model trained in only 4 classes I adapted the number of targets t.

I changed this block: https://github.com/fra31/auto-attack/blob/6482e4d6fbeeb51ae9585c41b16d50d14576aadc/autoattack/autoattack.py#L265 to

        elif version == 'standard_4':
            self.attacks_to_run = ['apgd-ce', 'apgd-t', 'fab-t', 'square']
            if self.norm in ['Linf', 'L2']:
                self.apgd.n_restarts = 1
                self.apgd_targeted.n_target_classes = 3 # 9
            elif self.norm in ['L1']:
                self.apgd.use_largereps = True
                self.apgd_targeted.use_largereps = True
                self.apgd.n_restarts = 5
                self.apgd_targeted.n_target_classes = 3 # 5
            self.fab.n_restarts = 1
            self.apgd_targeted.n_restarts = 1
            self.fab.n_target_classes = 3 # 9
            #self.apgd_targeted.n_target_classes = 9
            self.square.n_queries = 5000

where t = classes - 1 = 4 - 1 = 3, so that I dont get any warning.

It seems that APGD-CE is creating most perturbations.

using standard_4 version including apgd-ce, apgd-t, fab-t, square
initial accuracy: 93.00%
apgd-ce - 1/1 - 458 out of 465 successfully perturbed
robust accuracy after APGD-CE: 1.40% (total time 112.7 s)
apgd-t - 1/1 - 0 out of 7 successfully perturbed
robust accuracy after APGD-T: 1.40% (total time 122.2 s)
fab-t - 1/1 - 0 out of 7 successfully perturbed
robust accuracy after FAB-T: 1.40% (total time 133.2 s)
square - 1/1 - 0 out of 7 successfully perturbed
robust accuracy after SQUARE: 1.40% (total time 169.0 s)
max Linf perturbation: 0.00784, nan in tensor: 0, max: 1.00000, min: 0.00000
robust accuracy: 1.40%

Well, I think that I am not doing something wrong, right?

fra31 commented 2 years ago

Looks fine to me. As sanity check you can run apgd-ce as last attack by modifying the order in the self.attacks_to_run list.

jS5t3r commented 2 years ago

On page 4 of this publication: https://arxiv.org/abs/2112.01601 Figure 3 shows an examination where the classifier is trained on 1000 classes. (standard configuration: https://github.com/adverML/auto-attack/blob/aec817234bc565004fe3e9bee9afe41e931ca9ad/autoattack/autoattack.py#L277) Figure 4 shows an examination where the classifier is trained on 4 classes. I used this configuration: https://github.com/adverML/auto-attack/blob/aec817234bc565004fe3e9bee9afe41e931ca9ad/autoattack/autoattack.py#L293

I am just wondering, that the plots should look more similar.