Closed inkawhich closed 5 years ago
We don't perform any model selection and just keep the last model for all evaluations.
Just a word of caution about training with FGSM adversaries only. Models tend to overfit to adversarial examples against such weak adversaries and get lower standard accuracy than adversarial accuracy. This is not great. These models are not really robust or useful in any way.
Sweet, thank you. I am familiar with this issue about overfitting the adversary, I was just using it as an example to better explain my question about model selection.
So all models in Table 5 of paper are simply the last model from training?
Yes, exactly.
Thank for fast reply.
After training, we are left with ~80 models saved at each 1k iteration. What is your rule for selecting the "best" model to then keep and do further evaluations with? I am especially wondering because I have noticed that when I train a model with FGSM adversary only, if I simply select the model with the greatest robustness to FGSM adversary, the clean data accuracy may not be that great. Essentially, how do you determine tradeoff between robustness to the adversary you trained against vs clean data accuracy?
I will also specifically reference Table 5 in your paper (i.e. robustness to whitebox adversary). What was your criteria for choosing the models reported in this table?