bethgelab / foolbox

A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX
https://foolbox.jonasrauber.de
MIT License
2.77k stars 427 forks source link

DeepFool doesn't exactly match the latest reference implementation #283

Closed jonasrauber closed 2 years ago

jonasrauber commented 5 years ago

This was reported to me by @max-andr. Most of the differences are actually explicitly mentioned in comments in our implementation, but we should check again if we can match the reference implementation more closely and possible mention deviations in the docs, not just in comments.

@max-andr might create a PR to fix this

jonasrauber commented 5 years ago

related to #282

jonasrauber commented 5 years ago

Apparently, the DeepFool reference implementation changed after release (and after the Foolbox version was created), which explains some of the undocumented deviations, see e.g. https://github.com/LTS4/DeepFool/commit/10cf6425b54b33757a20a3ec56e812634da15d3f

jonasrauber commented 5 years ago

In case we switch to logits instead of softmax, we might want to keep softmax as an option.

wielandbrendel commented 5 years ago

By softmax you mean cross entropy? The difference of cross entropies is equivalent to the difference in logits.

jonasrauber commented 5 years ago

yes, I meant cross-entropy

wielandbrendel commented 5 years ago

Ok, but at least in that sentence our cross-entropy implementation should be equivalent to the logit-based original implementation.

jonasrauber commented 5 years ago

in that sentence

which sentence?

max-andr commented 5 years ago

By softmax you mean cross entropy? The difference of cross entropies is equivalent to the difference in logits.

Sorry, I overlooked the fact that they are exactly equivalent. Then the cross-entropy part is fine. Then the main problem that we encountered was actually #282 .

So regarding this issue, there is only one question left. In the official pytorch implementation the overshooting is added on every step. However, the original paper suggests to do this overshooting only in the end, which also agrees with the official matlab implementation. Foolbox implements the former, and adds the overshooting term on every iteration: perturbed = perturbed + 1.05 * perturbation

This difference is explicitly mentioned in the comments, so potentially Foolbox users are aware of this, so this is good.

However, a potential problem with this implementation (I'm not sure how thoroughly the authors of DeepFool tested their pytorch implementation) is that this kind of overshooting may fail in some cases. Namely, we observed that in some cases perturbation can already become a 0-vector (i.e. the point is at the decision boundary), and thus on every iteration we just add 1.05*0 = 0. So the point stays exactly at the decision boundary, and not on its opposite side as the idea of overshooting would suggest.

I think that it's actually fine to some extent (although it differs from the original paper), but the main question is whether you count such a point (if 2 classes have exactly the same maximum logit) as an adversarial example in the end? Or is it decided non-deterministically, which class is argmax in the end?

jonasrauber commented 5 years ago

If I am not mistaken, it is deterministic and well-defined (numpy.argmax returns the smaller one) nevertheless it might not necessary be what we want.

wielandbrendel commented 5 years ago

The Pytorch implementation performs the overshoot in a better way by multiplying the total deviation:

x_adv = x_original + (1 + overshoot) * (x_adv - x_original)

max-andr commented 5 years ago

If I am not mistaken, it is deterministic and well-defined (numpy.argmax returns the smaller one)

Indeed. From the docs: "In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned."

The Pytorch implementation performs the overshoot in a better way by multiplying the total deviation: x_adv = x_original + (1 + overshoot) * (x_adv - x_original)

Oh yes, I didn't notice that. The variant with multiplying the total perturbation should work better.

max-andr commented 5 years ago

nevertheless it might not necessary be what we want.

Seems like the Misclassification criterion would work (i.e. output that the point is adversarial) roughly 50% times for the cases when 2 logits are exactly the same: https://github.com/bethgelab/foolbox/blob/master/foolbox/criteria.py#L184

def is_adversarial(self, predictions, label):
        top1 = np.argmax(predictions)
        return top1 != label

i.e. if top1 < label, then the point on the decision boundary between 2 classes will be counted as an adversarial example, but if top1 > label, then not.

Although a proper overshooting scheme (i.e. with multiplying the total perturbation by 1+overshoot) will make such cases on the decision boundary extremely unlikely.