evaluating-adversarial-robustness / adv-eval-paper

LaTeX source for the paper "On Evaluating Adversarial Robustness"
https://arxiv.org/abs/1902.06705
250 stars 33 forks source link

Don't recommend transfer attacks (controversial) #27

Open carlini opened 3 years ago

carlini commented 3 years ago

I can't think of any evaluation (that wasn't obviously wrong ^1) in the last two years where transfer attacks helped invalidate the robustness claims. At best, transfer attacks can reduce clean accuracy to ~50% or so, and most papers claim less than this robustness in the first place. So it's not surprising that transfer attacks don't do better than any halfway reasonable attack.

However, there are lots of papers that include transfer attacks, cite (Athalye et al. 2018, Carlini et al. 2019) and then say "therefore our evaluation is probably correct". If we have any other idea that would be better (maybe exclusively ask for gradient-free attacks now? they're a lot better than they were in 2019.) then it might be worth thinking about including these.

This is obviously controversial. There can exist defenses where running transfer attacks would help diagnose problems. I just think it's sufficiently rare we should either remove it or downgrade it to something that should only really be done after everything else has been tried.

^1 There are some obviously-wrong evaluations where transfer attacks would have also shown they were wrong. But these are almost always papers that claim >>0% accuracy at linf eps=0.5 or something else absurd. So as long as there's another way to disprove it, arguably the transfer attack hasn't added much new value.

ftramer commented 3 years ago

The only one I can remember is Thermometer Encoding (which I wouldn't characterize as obviously wrong), where the original paper had results showing that transfer attacks worked better than white-box attacks, and this was viewed as a red flag.

But toning down transfer attacks seems in line with putting more focus on stronger black-box attacks that have emerged over the past years. At the time where this report was first written, I would have been a bit wary of black-box evaluations because the existing attacks (e.g., SPSA or Boundary) were quite brittle. Today, we have much stronger candidates so transfer attacks are indeed somewhat obsolete in a white-box threat model.