Random seed - Githubissues

fra31 / auto-attack

Code relative to "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"

https://arxiv.org/abs/2003.01690

MIT License

656 stars 112 forks source link

Random seed #54

Closed kaustubhsridhar closed 3 years ago

kaustubhsridhar commented 3 years ago

Hi, I am a big fan of autoattack and robustbench. The centralization/standardization of adversarial robustness is so helpful. :)

I'm working on a new approach to adversarial robustness and am evaluating on autoattack.

Unfortunately, I can't exactly replicate the values on robustbench with a random seed of 0 or 1. Could you please share the random seed you use for the numbers on the leaderboard?

Thanks

fra31 commented 3 years ago

Hi,

glad to hear that you find our work useful!

Unfortunately there's no single seed used for all models, and for many of them it was a random one. This is for many reasons e.g. the code has been updated over time, some evaluations come from the authors and we just rerun them. In my experience, for standard defenses without randomization, the variance between different runs is very small, and often the same robust accuracy is found. Do you notice larger variations?

kaustubhsridhar commented 3 years ago

Hi,

Thank you for the quick reply. :)

I noticed a not-so-small variation in TRADES (Zhang et al. 2019) where after retraining the WRN-34-10, I get 51.70% adversarial accuracy and not the 53.08% on robustbench. Part of this could be because I'm working with 8.0/255 instead of 0.031 but maybe also because of the random seed?

Thanks

fra31 commented 3 years ago

Using the slightly larger epsilon has definitely an impact, which might already close the gap. Also I think the randomness in retraining the model might significantly influence the robustness. From what I saw, different runs of AutoAttack might have some small fluctuations in the order of 0.02-0.03%.

kaustubhsridhar commented 3 years ago

Thanks for the numbers. Changing the epsilon does have an impact but randomness from retraining prevents me from getting the exact numbers on robust-bench. Thanks again.