Support for virtual adversarial perturbations

peck94 commented 7 years ago

Foolbox currently only supports finding adversarial examples on samples that are correctly classified to begin with. If one tries to run an attack on an incorrectly classified sample, the following warning results:

Not running the attack because the original image is already misclassified and the adversarial thus has a distance of 0.

However, in some contexts (such as Virtual Adversarial Training), it is useful to find adversarial examples on incorrectly classified inputs as well. That is, the training label is completely discarded and one instead only focuses on the label assigned by the classifier. These examples are called "virtual adversarial examples". Would it be possible to add support for them?

jonasrauber commented 7 years ago

Hi @peck94, it might be possible to add support for this. I am not sure however if it's necessary:

Foolbox is not designed for speed (and thus not for virtual adversarial training), but as a broad collection of attacks during test time. I would be interested to here more about whether you use Foolbox successfully to do VAT.
It might be a good-enough work-around to just pass np.argmax(model.predictions(image)) as the label – that way the attack will always run. I understand that this is not the same as running the attack with the original label despite a wrong classification of the original image, but it might be good enough depending on your use case.

Let me know what you think. If you really have a use case and the work-around is not good enough, I am totally open to support this.

peck94 commented 7 years ago

I personally haven't done VAT yet, so I can't say whether foolbox is adequate for those purposes or not. Regardless, the work-around you proposed did the trick for me. Thanks for the swift response, @jonasrauber!

bethgelab / foolbox

Support for virtual adversarial perturbations #55