"Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"\ Francesco Croce, Matthias Hein\ ICML 2020\ https://arxiv.org/abs/2003.01690
We propose to use an ensemble of four diverse attacks to reliably evaluate robustness:
Note: we fix all the hyperparameters of the attacks, so no tuning is required to test every new classifier.
We here list adversarial defenses, for many threat models, recently proposed and evaluated with the standard version of AutoAttack (AA), including
See below for the more expensive AutoAttack+ (AA+) and more options.
We report the source of the model, i.e. if it is publicly available, if we received it from the authors or if we retrained it, the architecture, the clean accuracy and the reported robust accuracy (note that might be calculated on a subset of the test set or on different models trained with the same defense). The robust accuracy for AA is on the full test set.
We plan to add new models as they appear and are made available. Feel free to suggest new defenses to test!
To have a model added: please check here.
Checkpoints: many of the evaluated models are available and easily accessible at this Model Zoo.
The robust accuracy is evaluated at eps = 8/255
, except for those marked with * for which eps = 0.031
, where eps
is the maximal Linf-norm allowed for the adversarial perturbations. The eps
used is the same set in the original papers.\
Note: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training).
Update: this is no longer maintained, but an up-to-date leaderboard is available in RobustBench.
# | paper | model | architecture | clean | report. | AA |
---|---|---|---|---|---|---|
1 | (Gowal et al., 2020)‡ | available | WRN-70-16 | 91.10 | 65.87 | 65.88 |
2 | (Gowal et al., 2020)‡ | available | WRN-28-10 | 89.48 | 62.76 | 62.80 |
3 | (Wu et al., 2020a)‡ | available | WRN-34-15 | 87.67 | 60.65 | 60.65 |
4 | (Wu et al., 2020b)‡ | available | WRN-28-10 | 88.25 | 60.04 | 60.04 |
5 | (Carmon et al., 2019)‡ | available | WRN-28-10 | 89.69 | 62.5 | 59.53 |
6 | (Gowal et al., 2020) | available | WRN-70-16 | 85.29 | 57.14 | 57.20 |
7 | (Sehwag et al., 2020)‡ | available | WRN-28-10 | 88.98 | - | 57.14 |
8 | (Gowal et al., 2020) | available | WRN-34-20 | 85.64 | 56.82 | 56.86 |
9 | (Wang et al., 2020)‡ | available | WRN-28-10 | 87.50 | 65.04 | 56.29 |
10 | (Wu et al., 2020b) | available | WRN-34-10 | 85.36 | 56.17 | 56.17 |
11 | (Alayrac et al., 2019)‡ | available | WRN-106-8 | 86.46 | 56.30 | 56.03 |
12 | (Hendrycks et al., 2019)‡ | available | WRN-28-10 | 87.11 | 57.4 | 54.92 |
13 | (Pang et al., 2020c) | available | WRN-34-20 | 86.43 | 54.39 | 54.39 |
14 | (Pang et al., 2020b) | available | WRN-34-20 | 85.14 | - | 53.74 |
15 | (Cui et al., 2020)* | available | WRN-34-20 | 88.70 | 53.57 | 53.57 |
16 | (Zhang et al., 2020b) | available | WRN-34-10 | 84.52 | 54.36 | 53.51 |
17 | (Rice et al., 2020) | available | WRN-34-20 | 85.34 | 58 | 53.42 |
18 | (Huang et al., 2020)* | available | WRN-34-10 | 83.48 | 58.03 | 53.34 |
19 | (Zhang et al., 2019b)* | available | WRN-34-10 | 84.92 | 56.43 | 53.08 |
20 | (Cui et al., 2020)* | available | WRN-34-10 | 88.22 | 52.86 | 52.86 |
21 | (Qin et al., 2019) | available | WRN-40-8 | 86.28 | 52.81 | 52.84 |
22 | (Chen et al., 2020a) | available | RN-50 (x3) | 86.04 | 54.64 | 51.56 |
23 | (Chen et al., 2020b) | available | WRN-34-10 | 85.32 | 51.13 | 51.12 |
24 | (Sitawarin et al., 2020) | available | WRN-34-10 | 86.84 | 50.72 | 50.72 |
25 | (Engstrom et al., 2019) | available | RN-50 | 87.03 | 53.29 | 49.25 |
26 | (Kumari et al., 2019) | available | WRN-34-10 | 87.80 | 53.04 | 49.12 |
27 | (Mao et al., 2019) | available | WRN-34-10 | 86.21 | 50.03 | 47.41 |
28 | (Zhang et al., 2019a) | retrained | WRN-34-10 | 87.20 | 47.98 | 44.83 |
29 | (Madry et al., 2018) | available | WRN-34-10 | 87.14 | 47.04 | 44.04 |
30 | (Pang et al., 2020a) | available | RN-32 | 80.89 | 55.0 | 43.48 |
31 | (Wong et al., 2020) | available | RN-18 | 83.34 | 46.06 | 43.21 |
32 | (Shafahi et al., 2019) | available | WRN-34-10 | 86.11 | 46.19 | 41.47 |
33 | (Ding et al., 2020) | available | WRN-28-4 | 84.36 | 47.18 | 41.44 |
34 | (Atzmon et al., 2019)* | available | RN-18 | 81.30 | 43.17 | 40.22 |
35 | (Moosavi-Dezfooli et al., 2019) | authors | WRN-28-10 | 83.11 | 41.4 | 38.50 |
36 | (Zhang & Wang, 2019) | available | WRN-28-10 | 89.98 | 60.6 | 36.64 |
37 | (Zhang & Xu, 2020) | available | WRN-28-10 | 90.25 | 68.7 | 36.45 |
38 | (Jang et al., 2019) | available | RN-20 | 78.91 | 37.40 | 34.95 |
39 | (Kim & Wang, 2020) | available | WRN-34-10 | 91.51 | 57.23 | 34.22 |
40 | (Wang & Zhang, 2019) | available | WRN-28-10 | 92.80 | 58.6 | 29.35 |
41 | (Xiao et al., 2020)* | available | DenseNet-121 | 79.28 | 52.4 | 18.50 |
42 | (Jin & Rinard, 2020) | available | RN-18 | 90.84 | 71.22 | 1.35 |
43 | (Mustafa et al., 2019) | available | RN-110 | 89.16 | 32.32 | 0.28 |
44 | (Chan et al., 2020) | retrained | WRN-34-10 | 93.79 | 15.5 | 0.26 |
The robust accuracy is computed at eps = 8/255
in the Linf-norm, except for the models marked with * for which eps = 0.031
is used. \
Note: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training).\
\
Update: this is no longer maintained, but an up-to-date leaderboard is available in RobustBench.
# | paper | model | architecture | clean | report. | AA |
---|---|---|---|---|---|---|
1 | (Gowal et al. 2020)‡ | available | WRN-70-16 | 69.15 | 37.70 | 36.88 |
2 | (Cui et al., 2020)* | available | WRN-34-20 | 62.55 | 30.20 | 30.20 |
3 | (Gowal et al. 2020) | available | WRN-70-16 | 60.86 | 30.67 | 30.03 |
4 | (Cui et al., 2020)* | available | WRN-34-10 | 60.64 | 29.33 | 29.33 |
5 | (Wu et al., 2020b) | available | WRN-34-10 | 60.38 | 28.86 | 28.86 |
6 | (Hendrycks et al., 2019)‡ | available | WRN-28-10 | 59.23 | 33.5 | 28.42 |
7 | (Cui et al., 2020)* | available | WRN-34-10 | 70.25 | 27.16 | 27.16 |
8 | (Chen et al., 2020b) | available | WRN-34-10 | 62.15 | - | 26.94 |
9 | (Sitawarin et al., 2020) | available | WRN-34-10 | 62.82 | 24.57 | 24.57 |
10 | (Rice et al., 2020) | available | RN-18 | 53.83 | 28.1 | 18.95 |
The robust accuracy is computed at eps = 0.3
in the Linf-norm.
# | paper | model | clean | report. | AA |
---|---|---|---|---|---|
1 | (Gowal et al., 2020) | available | 99.26 | 96.38 | 96.34 |
2 | (Zhang et al., 2020a) | available | 98.38 | 96.38 | 93.96 |
3 | (Gowal et al., 2019) | available | 98.34 | 93.78 | 92.83 |
4 | (Zhang et al., 2019b) | available | 99.48 | 95.60 | 92.81 |
5 | (Ding et al., 2020) | available | 98.95 | 92.59 | 91.40 |
6 | (Atzmon et al., 2019) | available | 99.35 | 97.35 | 90.85 |
7 | (Madry et al., 2018) | available | 98.53 | 89.62 | 88.50 |
8 | (Jang et al., 2019) | available | 98.47 | 94.61 | 87.99 |
9 | (Wong et al., 2020) | available | 98.50 | 88.77 | 82.93 |
10 | (Taghanaki et al., 2019) | retrained | 98.86 | 64.25 | 0.00 |
The robust accuracy is computed at eps = 0.5
in the L2-norm.\
Note: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training).
Update: this is no longer maintained, but an up-to-date leaderboard is available in RobustBench.
# | paper | model | architecture | clean | report. | AA |
---|---|---|---|---|---|---|
1 | (Gowal et al., 2020)‡ | available | WRN-70-16 | 94.74 | - | 80.53 |
2 | (Gowal et al., 2020) | available | WRN-70-16 | 90.90 | - | 74.50 |
3 | (Wu et al., 2020b) | available | WRN-34-10 | 88.51 | 73.66 | 73.66 |
4 | (Augustin et al., 2020)‡ | authors | RN-50 | 91.08 | 73.27 | 72.91 |
5 | (Engstrom et al., 2019) | available | RN-50 | 90.83 | 70.11 | 69.24 |
6 | (Rice et al., 2020) | available | RN-18 | 88.67 | 71.6 | 67.68 |
7 | (Rony et al., 2019) | available | WRN-28-10 | 89.05 | 67.6 | 66.44 |
8 | (Ding et al., 2020) | available | WRN-28-4 | 88.02 | 66.18 | 66.09 |
pip install git+https://github.com/fra31/auto-attack
Import and initialize AutoAttack with
from autoattack import AutoAttack
adversary = AutoAttack(forward_pass, norm='Linf', eps=epsilon, version='standard')
where:
forward_pass
returns the logits and takes input with components in [0, 1] (NCHW format expected),norm = ['Linf' | 'L2' | 'L1']
is the norm of the threat model,eps
is the bound on the norm of the adversarial perturbations,version = 'standard'
uses the standard version of AA.To apply the standard evaluation, where the attacks are run sequentially on batches of size bs
of images
, use
x_adv = adversary.run_standard_evaluation(images, labels, bs=batch_size)
To run the attacks individually, use
dict_adv = adversary.run_standard_evaluation_individual(images, labels, bs=batch_size)
which returns a dictionary with the adversarial examples found by each attack.
To specify a subset of attacks add e.g. adversary.attacks_to_run = ['apgd-ce']
.
To evaluate models implemented in TensorFlow 1.X, use
from autoattack import utils_tf
model_adapted = utils_tf.ModelAdapter(logits, x_input, y_input, sess)
from autoattack import AutoAttack
adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)
where:
logits
is the tensor with the logits given by the model,x_input
is a placeholder for the input for the classifier (NHWC format expected),y_input
is a placeholder for the correct labels,sess
is a TF session.If TensorFlow's version is 2.X, use
from autoattack import utils_tf2
model_adapted = utils_tf2.ModelAdapter(tf_model)
from autoattack import AutoAttack
adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)
where:
tf_model
is tf.keras model without activation function 'softmax'The evaluation can be run in the same way as done with PT models.
Examples of how to use AutoAttack can be found in examples/
. To run the standard evaluation on a pretrained
PyTorch model on CIFAR-10 use
python eval.py [--individual] --version=['standard' | 'plus']
where the optional flags activate respectively the individual evaluations (all the attacks are run on the full test set) and the version of AA to use (see below).
A more expensive evaluation can be used specifying version='plus'
when initializing AutoAttack. This includes
In case of classifiers with stochastic components one can combine AA with Expectation over Transformation (EoT) as in (Athalye et al., 2018) specifying version='rand'
when initializing AutoAttack.
This runs
It is possible to customize the attacks to run specifying version='custom'
when initializing the attack and then, for example,
if args.version == 'custom':
adversary.attacks_to_run = ['apgd-ce', 'fab']
adversary.apgd.n_restarts = 2
adversary.fab.n_restarts = 2
It is possible to fix the random seed used for the attacks with, e.g., adversary.seed = 0
. In this case the same seed is used for all the attacks used, otherwise a different random seed is picked for each attack.
To log the intermediate results of the evaluation specify log_path=/path/to/logfile.txt
when initializing the attack.
@inproceedings{croce2020reliable,
title = {Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks},
author = {Francesco Croce and Matthias Hein},
booktitle = {ICML},
year = {2020}
}
@inproceedings{croce2021mind,
title={Mind the box: $l_1$-APGD for sparse adversarial attacks on image classifiers},
author={Francesco Croce and Matthias Hein},
booktitle={ICML},
year={2021}
}