Closed jonasrauber closed 2 years ago
related to #282
Apparently, the DeepFool reference implementation changed after release (and after the Foolbox version was created), which explains some of the undocumented deviations, see e.g. https://github.com/LTS4/DeepFool/commit/10cf6425b54b33757a20a3ec56e812634da15d3f
In case we switch to logits instead of softmax, we might want to keep softmax as an option.
By softmax you mean cross entropy? The difference of cross entropies is equivalent to the difference in logits.
yes, I meant cross-entropy
Ok, but at least in that sentence our cross-entropy implementation should be equivalent to the logit-based original implementation.
in that sentence
which sentence?
By softmax you mean cross entropy? The difference of cross entropies is equivalent to the difference in logits.
Sorry, I overlooked the fact that they are exactly equivalent. Then the cross-entropy part is fine. Then the main problem that we encountered was actually #282 .
So regarding this issue, there is only one question left.
In the official pytorch implementation the overshooting is added on every step. However, the original paper suggests to do this overshooting only in the end, which also agrees with the official matlab implementation.
Foolbox implements the former, and adds the overshooting term on every iteration:
perturbed = perturbed + 1.05 * perturbation
This difference is explicitly mentioned in the comments, so potentially Foolbox users are aware of this, so this is good.
However, a potential problem with this implementation (I'm not sure how thoroughly the authors of DeepFool tested their pytorch implementation) is that this kind of overshooting may fail in some cases. Namely, we observed that in some cases perturbation
can already become a 0-vector (i.e. the point is at the decision boundary), and thus on every iteration we just add 1.05*0 = 0. So the point stays exactly at the decision boundary, and not on its opposite side as the idea of overshooting would suggest.
I think that it's actually fine to some extent (although it differs from the original paper), but the main question is whether you count such a point (if 2 classes have exactly the same maximum logit) as an adversarial example in the end? Or is it decided non-deterministically, which class is argmax in the end?
If I am not mistaken, it is deterministic and well-defined (numpy.argmax
returns the smaller one) nevertheless it might not necessary be what we want.
The Pytorch implementation performs the overshoot in a better way by multiplying the total deviation:
x_adv = x_original + (1 + overshoot) * (x_adv - x_original)
If I am not mistaken, it is deterministic and well-defined (numpy.argmax returns the smaller one)
Indeed. From the docs: "In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned."
The Pytorch implementation performs the overshoot in a better way by multiplying the total deviation:
x_adv = x_original + (1 + overshoot) * (x_adv - x_original)
Oh yes, I didn't notice that. The variant with multiplying the total perturbation should work better.
nevertheless it might not necessary be what we want.
Seems like the Misclassification
criterion would work (i.e. output that the point is adversarial) roughly 50% times for the cases when 2 logits are exactly the same:
https://github.com/bethgelab/foolbox/blob/master/foolbox/criteria.py#L184
def is_adversarial(self, predictions, label):
top1 = np.argmax(predictions)
return top1 != label
i.e. if top1 < label, then the point on the decision boundary between 2 classes will be counted as an adversarial example, but if top1 > label, then not.
Although a proper overshooting scheme (i.e. with multiplying the total perturbation by 1+overshoot
) will make such cases on the decision boundary extremely unlikely.
This was reported to me by @max-andr. Most of the differences are actually explicitly mentioned in comments in our implementation, but we should check again if we can match the reference implementation more closely and possible mention deviations in the docs, not just in comments.
@max-andr might create a PR to fix this