fra31 / auto-attack

Code relative to "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"
https://arxiv.org/abs/2003.01690
MIT License
639 stars 111 forks source link

Question about variation in reported (clean & robust accuracy) metrics #74

Closed choltz95 closed 2 years ago

choltz95 commented 2 years ago

Hello, running into something confusing & wondering if if I am using AA correctly. When I evaluate my network using AA with default settings, there is a small variation in reported the clean and robust accuracy compared to when I compute metrics myself using the output of run_standard_evaluation:

For example, here I evaluate the clean accuracy

> (model(x_test).argmax(1) == y_test.argmax(1)).sum()/len(x_test)
0.87

Here, I want to evaluate robust accuracy using AA.

>x_adv = adversary.run_standard_evaluation(x_test, y_test.argmax(1), bs=32)
using standard version including apgd-ce, apgd-t, fab-t, square
initial accuracy: 85.00%
.
.
.
robust accuracy: 49.00%
> (model(x_adv).argmax(1) == y_test.argmax(1)).sum()/len(x_adv)
0.54

So first, there is a 2-percent difference in clean accuracy, but more seriously, there is a large difference in the robust accuracies. I have verified the intermediate output of autoattack- e.g. when I add up the numbers by hand I get 42.00% accuracy. The weird thing is the big difference when I use the x_adv output of AA. Is there something I'm missing about the output x_adv? My defense is basically adversarial training on a wide-resnet, so no randomness in the forward pass.

fra31 commented 2 years ago

Hi,

the two values should be the same. Is it possible that the model is used in training mode like it happened in this case?

choltz95 commented 2 years ago

This was it! Thanks so much!