How to generate DeepFool attack with l_2 constraints?

kotleta2007 commented 3 years ago

I have tried generating some adversarial samples with DeepFool and setting the eps parameter to 0.2, however the attack samples had a distance far greater than 0.2 from the original samples.

Is it possible to use another parameter or modify the value of eps to ensure that the l_2 norm of adversarial - original is less than a certain threshold?

Thank you in advance.

beat-buesser commented 3 years ago

Hi @kotleta2007 Thank you very much for using ART!

Are you referring to the argument epsilon of DeepFool? Can you share the code section where you define the attack?

kotleta2007 commented 3 years ago

Hello!

Indeed, I was referring to the epsilon argument.

In the following code, I attack a simple fully connected MNIST classifier. We can see that even though the epsilon parameter was set to 0.2, all samples have a much higher distance from the original than that:

import tensorflow as tf from art.estimators.classification import TensorFlowV2Classifier import numpy as np from art.attacks.evasion import DeepFool

mnist = tf.keras.datasets.mnist (X_train, y_train), (X_test, y_test) = mnist.load_data() X_train, X_test = X_train / 255.0, X_test / 255.0

model = tf.keras.Sequential([ tf.keras.layers.Flatten(input_shape=(28,28)), tf.keras.layers.Dense(100, activation='relu'), tf.keras.layers.Dense(100, activation='relu'), tf.keras.layers.Dense(10, activation='softmax') ])

model.compile( loss='SparseCategoricalCrossentropy', optimizer='adam', metrics=['accuracy'] )

model.fit(X_train, y_train, epochs=10) model.evaluate(X_test, y_test)

ART_classifier = TensorFlowV2Classifier( model=model, nb_classes=10, input_shape=(28, 28), loss_object=tf.keras.losses.SparseCategoricalCrossentropy(), clip_values=(0, 1) )

attack = DeepFool(classifier=ART_classifier, epsilon=0.2) SAMPLE_SIZE = 10 X_test_adv = attack.generate(X_test[:SAMPLE_SIZE])

_, adv_accuracy = model.evaluate(X_test_adv, y_test[:SAMPLE_SIZE]) print('Accuracy on adversarial test data: {:4.2f}%'.format(adv_accuracy * 100))

for i in range(SAMPLE_SIZE): print("l2 distance from original: {}".format(np.linalg.norm(X_test_adv[i] - X_test[i], ord=2)))

Output:

l2 distance from original: 7.5198902445117515 l2 distance from original: 12.517949078098 l2 distance from original: 15.583265832646825 l2 distance from original: 10.309606071366971 l2 distance from original: 12.67709685435253 l2 distance from original: 14.319209476402303 l2 distance from original: 14.030292651152768 l2 distance from original: 11.192603835883371 l2 distance from original: 13.267377609462349 l2 distance from original: 9.74294408168186

beat-buesser commented 3 years ago

Hi @kotleta2007

Thank you for the example code. I have noticed two items:

The argument epsilon of DeepFool does not define the maximum permitted perturbation and therefore cannot be compared directly to the L2 norm of the observed perturbation. The epsilon corresponds to the parameter eta of Moosavi-Dezfooli et al. which defines how much the adversarial example should be pushed across the boundary instead of having it exactly on or too close to the boundary.
Your model is using a softmax activation in its last layer. DeepFool expects a model to output logits to achieve its best attack performance. Therefore you should change softmax to linear.

Trusted-AI / adversarial-robustness-toolbox

How to generate DeepFool attack with l_2 constraints? #1089