bethgelab / foolbox

A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX
https://foolbox.jonasrauber.de
MIT License
2.77k stars 427 forks source link

DeepFool does not handle the case of two classes having the same logit well #282

Closed jonasrauber closed 2 years ago

jonasrauber commented 5 years ago

This was reported to me by @max-andr.

Possibly, changing the residual_labels function to use <= instead of < and explicitly exclude the original class.

@max-andr: In case you create a PR anyway to make DeepFool to be more similar to the latest reference implementation, could you make this a part of it? If you don't plan to do that, please let me know.

jonasrauber commented 5 years ago

related to #283

max-andr commented 5 years ago

Just to be clear, we observed that this is really crucial for DeepFool. What happens in the current implementation is that for some points the attack converges exactly to the decision boundary between two classes. I.e. 2 classes (the original class and some other one) receive exactly the same logits. And then after converging to this decision boundary, suddenly this other class is excluded, so DeepFool completely changes its direction and starts to converge to some third class. This is certainly not the intended behavior...

Possibly, changing the residual_labels function to use <= instead of < and explicitly exclude the original class.

This sounds good. I can make the corresponding PR. Would this way of getting residual_labels be consistent with the rest of the library or it doesn't matter?

jonasrauber commented 5 years ago

Thanks again. What I didn't quite get when we talked in person: Foolbox returns the best adversarial it has seen throughout the attack (not necessarily the last point), so the result shouldn't suddenly get worse.

max-andr commented 5 years ago

Foolbox returns the best adversarial it has seen throughout the attack (not necessarily the last point)

But if DeepFool finds any adversarial example, it stops immediately. However, what we observed with Francesco was that DeepFool didn't stop (i.e. an adversarial example wasn't found), and instead, after the convergence to the decision boundary between classes A and B, it suddenly jumped to a class C.

Seems like it's also very related to the adversarial criterion: https://github.com/bethgelab/foolbox/issues/283#issuecomment-473826909

Although np.argmax is indeed deterministic, it takes the final decision based on the indices of the classes, i.e. whether top1 < label or not. In an unlucky case, top1 > label, thus the data point on the decision boundary is not regarded as an adversarial example, and then this jump to some other class C happens.