Closed samuelemarro closed 4 years ago
Does this lead to an actual problem with an attack?
gradient()
is not really an interface for users and instead intended for the attacks.
In DeepFool, the magnitude of the perturbation is inversely proportional to the norm of the gradient difference. This means that a smaller gradient makes DeepFool seriously overshoot. For example, DeepFool with a batch size of 50-100 returns unrecognizable images on CIFAR-10.
Thanks for reporting this. I have not yet looked at it in detail, but it might be possible that is a problem that was introduced with the batch support in 2.0.
Thanks again, this is indeed a bug and will be fixed in the next release.
Your proposed fix is correct: nn.CrossEntropyLoss(reduction='sum')
released 2.4.0 with the fix
OS: Windows 10 Python Version: 3.7.1 Foolbox Version: 2.3.0 Torch Version: 1.4.0 (CUDA 10.1)
When I call
.gradients()
with a batch of sizeb
, the returned gradient is alwaysgrad / b
.Code to reproduce:
By setting
batch_size = 10
, the new gradients are exactly 1/10 of the original ones. The problem also appears with.forward_and_gradient()
.A possible cause is the reduction used by
nn.CrossEntropyLoss()
: by default,CrossEntropyLoss
returns the mean of the loss across the whole batch, so the loss is divided by the batch size. Usingnn.CrossEntropyLoss(reduction='sum')
fixes the problem, but I don't know if it's a legitimate solution or just a workaround.