CW L2 attack uses predictions instead of logits

bethgelab / foolbox

A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX

https://foolbox.jonasrauber.de

MIT License

2.78k stars 426 forks source link

CW L2 attack uses predictions instead of logits #618

Open ines21 opened 3 years ago

ines21 commented 3 years ago

Running the Carlini and Wagner attack, I was having less success than the paper stated. I noticed that the implementation in Foolbox was using the final normalised predictions instead of the unnormalised logits, which makes the attack less effective than it is supposed to be (especially against defensive distillation).

This might be the task of the person running the attack to pass a logits model, but it is still worth mentioning maybe in the documentation?

jangop commented 2 years ago

If I am not mistaken, this does not only pertain to CW, correct?

This should definitely be mentioned in the documentation, and it might even be worthwhile to include a heuristic check when initializing an fmodel to ensure that the final layer does not resemble softmax. I remember doing something similar (in a different context) when comparing several pretrained models to ensure all gave me logits.

zimmerrol commented 2 years ago

I agree that the documentation should be more concreate in that regard. I suppose it makes sense to overhaul the documentation, soon. We can collect these ideas in #654.

ines21 commented 2 years ago

yep @jangop that is what I ended up implementing myself. It is been a long time since I was working on this project, but if I remember correctly lots of other attacks were working well for softmax models. C&W was the one for which this was crucial.