bethgelab / foolbox

A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX
https://foolbox.jonasrauber.de
MIT License
2.79k stars 426 forks source link

Error: TargetClassProbability criterion is inconsistent (PyTorch) #157

Closed maxdinech closed 6 years ago

maxdinech commented 6 years ago

The TargetClassProbability criterion seems to behave stragely in some cases: for example, with FashionMNIST:

fmodel = foolbox.models.PyTorchModel(model, (0, 1), num_classes=10, channel_axis=, cuda=False)
criterion = foolbox.criteria.TargetClassProbability(target_class, 0.2)
attack = foolbox.attacks.ContrastReductionAttack(fmodel, criterion)
adv = torch.Tensor(attack2(np.array(img).reshape(1, 28, 28), img_class).view(1, 1, 28, 28)

gives an adversarial example with a classification probability of 83%

But then, in the exact same code, only raising the criteria target class probability to 0.5 causes the attack to fail (while it should theoretically work for any values under 0.83 at least).

Is this expected ?

maxdinech commented 6 years ago

This issue was partially explained by the fact that the predictions given to the criterion need to be the pre-softmax output values of the neural network. (see #158)