bethgelab / foolbox

A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX
https://foolbox.jonasrauber.de
MIT License
2.79k stars 426 forks source link

Are the wrong classified images sorted out? #713

Closed jS5t3r closed 1 year ago

jS5t3r commented 1 year ago

Assume a pre-trained classifier. The clean accuracy is 87.5 %.

There 12.5% which are wrongly classified on a pre-trained PyTorch model before running the attack. What happens with these 12.5% within the attack method?

xp_, x_, success = attack(fmodel, images, labels, epsilons=[8./255.])

Are these 12.5% ignored in advance?

zimmerrol commented 1 year ago

No, they are not ignored. Why is this the case? The criterion for when a sample should be ignored depends on the attack: if you run an untargeted attack the 12.5% of samples should be ignored; for an untargeted attack, however, it doesn't make sense to ignore them.