carlini / nn_robust_attacks

Robust evasion attacks against neural network to find adversarial examples
BSD 2-Clause "Simplified" License
789 stars 229 forks source link

why I only change the weight of the attacked model, the examples are not adversarial any more. #26

Closed xieyi318 closed 5 years ago

xieyi318 commented 5 years ago

Hi, when I trained the default model for the first time and based on these weights I can generate adversarial examples with 100% attack success rate. But I trained the same model one more time and save the weights, try to attack this second model by the examples I generated based on the first one, it dose not work, the test accuracy is over 65%. Dose it make sense? I thought the adversarial examples should have transferability within the same model but different weights. Thank you for your time!!

carlini commented 5 years ago

It does have transferability -- it worked 45% of the time here. See Figure 9 of the paper for some comparison of confidence vs. transferability.

xieyi318 commented 5 years ago

So when I try to explore the defense method, setting higher confidence such as 1 is the best choice?

carlini commented 5 years ago

Yes. See also other papers on improving transferability, e.g., https://arxiv.org/abs/1611.02770.