Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
MIT License
4.78k stars 1.16k forks source link

How to generate DeepFool attack with l_2 constraints? #1089

Closed kotleta2007 closed 3 years ago

kotleta2007 commented 3 years ago

I have tried generating some adversarial samples with DeepFool and setting the eps parameter to 0.2, however the attack samples had a distance far greater than 0.2 from the original samples.

Is it possible to use another parameter or modify the value of eps to ensure that the l_2 norm of adversarial - original is less than a certain threshold?

Thank you in advance.

beat-buesser commented 3 years ago

Hi @kotleta2007 Thank you very much for using ART!

Are you referring to the argument epsilon of DeepFool? Can you share the code section where you define the attack?

kotleta2007 commented 3 years ago

Hello!

Indeed, I was referring to the epsilon argument.

In the following code, I attack a simple fully connected MNIST classifier. We can see that even though the epsilon parameter was set to 0.2, all samples have a much higher distance from the original than that:

import tensorflow as tf from art.estimators.classification import TensorFlowV2Classifier import numpy as np from art.attacks.evasion import DeepFool

mnist = tf.keras.datasets.mnist (X_train, y_train), (X_test, y_test) = mnist.load_data() X_train, X_test = X_train / 255.0, X_test / 255.0

model = tf.keras.Sequential([ tf.keras.layers.Flatten(input_shape=(28,28)), tf.keras.layers.Dense(100, activation='relu'), tf.keras.layers.Dense(100, activation='relu'), tf.keras.layers.Dense(10, activation='softmax') ])

model.compile( loss='SparseCategoricalCrossentropy', optimizer='adam', metrics=['accuracy'] )

model.fit(X_train, y_train, epochs=10) model.evaluate(X_test, y_test)

ART_classifier = TensorFlowV2Classifier( model=model, nb_classes=10, input_shape=(28, 28), loss_object=tf.keras.losses.SparseCategoricalCrossentropy(), clip_values=(0, 1) )

attack = DeepFool(classifier=ART_classifier, epsilon=0.2) SAMPLE_SIZE = 10 X_test_adv = attack.generate(X_test[:SAMPLE_SIZE])

_, adv_accuracy = model.evaluate(X_test_adv, y_test[:SAMPLE_SIZE]) print('Accuracy on adversarial test data: {:4.2f}%'.format(adv_accuracy * 100))

for i in range(SAMPLE_SIZE): print("l2 distance from original: {}".format(np.linalg.norm(X_test_adv[i] - X_test[i], ord=2)))

Output:

l2 distance from original: 7.5198902445117515 l2 distance from original: 12.517949078098 l2 distance from original: 15.583265832646825 l2 distance from original: 10.309606071366971 l2 distance from original: 12.67709685435253 l2 distance from original: 14.319209476402303 l2 distance from original: 14.030292651152768 l2 distance from original: 11.192603835883371 l2 distance from original: 13.267377609462349 l2 distance from original: 9.74294408168186

beat-buesser commented 3 years ago

Hi @kotleta2007

Thank you for the example code. I have noticed two items: