Return image for each gradient step

bethgelab / foolbox

A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX

https://foolbox.jonasrauber.de

MIT License

2.71k stars 421 forks source link

Return image for each gradient step #259

Closed xerxes01 closed 5 years ago

xerxes01 commented 5 years ago

I need to evaluate my model's defence on the perturbed images obtained by RPGD attack, however foolbox does not return the image in case the attack fails. Which specific module should be edited in order to return the failed adversarial sample?

wielandbrendel commented 5 years ago

I don't think you need the failed adversarial sample - that's just some image which is not adversarial, so I don't see why you would be interested in it. Are you generating adversarials on the undefended model and test on the defended one?

xerxes01 commented 5 years ago

How do I then measure my model's defence accuracy, like they do it here : https://www.robust-ml.org/defenses/ . Or is that (the claims) just the inverse attack success rate (1-probability of outputting the adversarial target class)? I can't use robustml to evaluate as the dataset I use is isn't suppported by it.

wielandbrendel commented 5 years ago

Right - that's just the inverse attack success rate.

xerxes01 commented 5 years ago

Oh Okay! Also is it the same for measuring defence on targeted attacks as well?

wielandbrendel commented 5 years ago

Sure.

rufinv commented 5 years ago

Sorry for butting in, I'm confused by this answer: isn't it possible for a targeted attack to fail, and yet the model would misclassify the perturbed image? (Say, original_class=dog, adversarial_target=car, model.predict(perturbed_image)=pizza?) In that case, model_accuracy is not (1-attack_success_rate)?

wielandbrendel commented 5 years ago

Well, you only perform a targeted attack if you define an adversarial as a perturbed image classified as the target class. Everything else is not adversarial and so you keep computing the adversarial success as before.

xerxes01 commented 5 years ago

Got it! Thanks!