Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
MIT License
4.82k stars 1.16k forks source link

Difference in adversarial images and adversarial accuracy across different platforms (Keras Classifier and TensorflowV2 Classifier). #279

Closed shashankkotyan closed 4 years ago

shashankkotyan commented 4 years ago

Bug Description I notice a change in adversarial accuracy and adversarial image when I shift from KerasClassifier to TensorFlowV2Classifier.

Results I have for Fast Gradient Sign Method. I attacked the first 100 images in the Cifar-10 Test Dataset. The values in [*] show number of images attacked for each class KC: [ 6,6,4,8,4,8,12,10,9,10 ] = 77 images successfully attacked out of 100 images. TF: [ 1,2,2,2,3,4,0,4,4,4 ] = 26 images successfully attacked out of 100 images.

To Reproduce Keras Classifier is loaded the by the following line, classifier = art.classifiers.KerasClassifier(model)

Tensorflow Classifier is loaded by the following line, classifier = art.classifiers.TensorFlowV2Classifier(model, 10, (32, 32, 3), loss_object= tf.keras.losses.CategoricalCrossentropy())

Attack Function is initialised by the following line, attack = art.attacks.FastGradientMethod(classifier=classifier)

Expected behaviour I expected the adversarial accuracy and adversarial images generated by the attacks to be same but they differ across different platforms.

Screenshots Keras Adversarial Image Keras_1_1_0

Tesnorflow Adversarial Image Tensorflow_1_1_0

System information For Keras Test on Python 3.6.10

System information For TensorflowV2 Test on Python 3.7.4

All the tests were performed on Ubuntu 18.04.3 LTS (bionic).

beat-buesser commented 4 years ago

Hi @shashankkotyan Thank you very much for using ART and raising this issue! To get a better understanding, could you please provide additional information about your model (structure, loss function, etc.)?

shashankkotyan commented 4 years ago

Thank you for your prompt reply @beat-buesser.

I have used vanilla ResNet architecture whose weights I load from a model weights (.h5) file. In both the experiments, Keras and TensorflowV2, I use the same .h5 file. The only difference is using Keras Module in the first experiment and TensorflowV2 Module in the second experiment.

Code Snippet:

In Keras experiment from keras import initializers, layers, models, optimizers, regularizers

In TensorflowV2 experiment from tensorflow.keras import initializers, layers, models, optimizers, regularizers

img_rows, img_cols, img_channels = 32,32,3
stack_n = 5
weight_decay = 0.0001
optimizer = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)

def residual_block(img_input, out_channel, increase=False):
    if increase: stride = (2,2)
    else: stride = (1,1)
    x = img_input
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Conv2D(out_channel,kernel_size=(3,3),strides=stride,padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Conv2D(out_channel,kernel_size=(3,3),strides=(1,1),padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)
    if increase:
        projection = layers.Conv2D(out_channel, kernel_size=(1,1), strides=(2,2), padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(img_input)
        return layers.add([x, projection])
    else: return layers.add([img_input, x])

img_input = layers.Input(shape=(img_rows, img_cols, img_channels))

x = layers.Conv2D(filters=16,kernel_size=(3,3),strides=(1,1),padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(img_input)
for _ in range(stack_n):    x = residual_block(x, 16, False)
x = residual_block(x, 32, True)
for _ in range(1, stack_n): x = residual_block(x, 32, False)
x = residual_block(x, 64, True)
for _ in range(1, stack_n): x = residual_block(x, 64, False)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(10, name='Output', activation='softmax', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)

model = models.Model(img_input, x)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

model.load_weights(f"model_weights.h5")
beat-buesser commented 4 years ago

@shashankkotyan I have used your code above to train a model with Keras 2.3.1 and have tested it after loading from h5-file with KerasClassifier and TensorFlowV2Classifer with TensorFlow 2.1. I have received identical adversarial examples in both cases using FastGradientMethod. I have also checked that the loss gradients provided by the two classifiers are identical. So far I'm not able to reproduce your observation.

Can you confirm that you are using the same h5-file for both of your experiments (it could have been overwritten)? Could you provide a single example script that returns the difference that you have observed? I recommend setting clip_values=(0,1) or clip_values=(0,255) for the ART classifiers, this makes sure that the pixels of the adversarial example stay in the valid range.

shashankkotyan commented 4 years ago

@beat-buesser I have checked the h5 file and it remains the same across my tests. It has an accuracy of 92.7% on Cifar-10 Test Dataset given by the sample script below. I have also checked the setting of clip values. In my case as I give a preprocessed image to the attacked, therefore, I use clip_values=(0,1)

A summary of adversarial accuracy on the first 100 samples is

With Clip Value Without Clip Value
Keras Version 47/100 59/100
Tensorflow V2 Version 9/100 26/100

If possible, can you also check the adversarial accuracy across multiple images?

Also, just to mention I use two environments to reproduce, One has Tensorflow 2.1.0, Other has Tensorflow 1.13.1 and Keras 2.2.3

The h5 file I use for the tests. model_weights.zip

Code Example to Reproduce

import os, sys
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

import tensorflow as tf
tf.get_logger().setLevel("ERROR")

# ! Change of These Parameters ! 
keras_opt = True
clip_opt  = False

fname = ""
if keras_opt: fname = f"{fname}KerasVersion"
else:         fname = f"{fname}TensorflowVersion"

if clip_opt:  fname = f"{fname} With ClipValues"
else:         fname = f"{fname} Without ClipValues"

if keras_opt:
    from keras import datasets, initializers, layers, models, optimizers, regularizers
else:
    from tensorflow.keras import datasets, initializers, layers, models, optimizers, regularizers 

import numpy as np

num_images   = {'train': 50000, 'test': 10000}

num_classes  = 10
dataset_name = 'Cifar10'
class_names  = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

mean = [125.307, 122.95, 113.865]
std  = [62.9932, 62.0887, 66.7048]

(raw_x_train, raw_y_train), (raw_x_test, raw_y_test) = datasets.cifar10.load_data()
raw_y_train, raw_y_test = raw_y_train[:,0], raw_y_test[:,0]

def color_preprocess(imgs):
    if imgs.ndim < 4: imgs = np.array([imgs])
    imgs = imgs.astype('float32')
    for i in range(3): imgs[:,:,:,i] = (imgs[:,:,:,i] - mean[i]) / std[i]
    return imgs

def color_postprocess(imgs):
    if imgs.ndim < 4: imgs = np.array([imgs])
    imgs = imgs.astype('float32')
    for i in range(3): imgs[:,:,:,i] = (imgs[:,:,:,i] * std[i]) + mean[i]
    return imgs.astype(int)

img_rows, img_cols, img_channels = 32,32,3
stack_n = 5
weight_decay = 0.0001
optimizer = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)

def residual_block(img_input, out_channel, increase=False):
    if increase: stride = (2,2)
    else: stride = (1,1)
    x = img_input
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Conv2D(out_channel,kernel_size=(3,3),strides=stride,padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Conv2D(out_channel,kernel_size=(3,3),strides=(1,1),padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)
    if increase:
        projection = layers.Conv2D(out_channel, kernel_size=(1,1), strides=(2,2), padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(img_input)
        return layers.add([x, projection])
    else: return layers.add([img_input, x])

img_input = layers.Input(shape=(img_rows, img_cols, img_channels))

x = layers.Conv2D(filters=16,kernel_size=(3,3),strides=(1,1),padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(img_input)
for _ in range(stack_n):    x = residual_block(x, 16, False)
x = residual_block(x, 32, True)
for _ in range(1, stack_n): x = residual_block(x, 32, False)
x = residual_block(x, 64, True)
for _ in range(1, stack_n): x = residual_block(x, 64, False)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(num_classes, name='Output', activation='softmax', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)

model = models.Model(img_input, x)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

model.load_weights(f"model_weights.h5")

pred = np.argmax(model.predict(color_preprocess(raw_x_test)), axis=1)

# Accuracy of the model remains 0.927 across the tests which confirms that the h5 file is not overwritten.
print(f"{np.sum(pred==raw_y_test)/len(raw_y_test):.3f}")

from art import attacks, classifiers

if keras_opt:
    if clip_opt:
        classifier = classifiers.KerasClassifier(model, use_logits=False, channel_index=3, clip_values=(0,1), defences=None, preprocessing=(0, 1), input_layer=0, output_layer=0)
    else:
        classifier = classifiers.KerasClassifier(model, use_logits=False, channel_index=3, clip_values=None, defences=None, preprocessing=(0, 1), input_layer=0, output_layer=0)
else:
    if clip_opt:
        classifier = classifiers.TensorFlowV2Classifier(model, num_classes, (img_rows, img_cols, img_channels), loss_object=tf.keras.losses.CategoricalCrossentropy(), train_step=None, channel_index=3, clip_values=(0,1), defences=None, preprocessing=(0, 1))
    else:
        classifier = classifiers.TensorFlowV2Classifier(model, num_classes, (img_rows, img_cols, img_channels), loss_object=tf.keras.losses.CategoricalCrossentropy(), train_step=None, channel_index=3, clip_values=None, defences=None, preprocessing=(0, 1))

attacker = attacks.evasion.FastGradientMethod(classifier=classifier, norm=np.inf, targeted=False, eps=0.3)

def attack(x,y):
    adv_x           = color_postprocess(attacker.generate(color_preprocess(x)))[0]
    prior_probs     = model.predict(color_preprocess(x))[0]
    predicted_probs = model.predict(color_preprocess(adv_x))[0]
    actual_class    = y # np.argmax(prior_probs)
    predicted_class = np.argmax(predicted_probs)
    success         = predicted_class != actual_class
    return adv_x, success

samples = 100
adv_xs  = []
succeses = []
for x, y in zip(raw_x_test[:samples], raw_y_test[:samples]):
    adv_x, success = attack(x, y)
    adv_xs   += [adv_x]
    succeses += [success]

grid = np.sqrt(samples).astype(int)

original = raw_x_test[:samples]
original = original.reshape(grid, grid, img_rows, img_cols, img_channels).swapaxes(1, 2).reshape(grid*img_rows, grid*img_cols, img_channels)
adversarial = np.array(adv_xs)
adversarial =adversarial.reshape(grid, grid, img_rows, img_cols, img_channels).swapaxes(1, 2).reshape(grid*img_rows, grid*img_cols, img_channels)

indices = np.where(np.array(succeses) == True)[0]

from matplotlib import pyplot as plt

fig        = plt.figure(1, figsize=(20,10), dpi=300)
(ax1, ax2) = fig.subplots(1,2)
ax1.imshow(original.astype(int))
ax2.imshow(adversarial.astype(int))
ax1.set_xticks([]); ax2.set_xticks([])
ax1.set_yticks([]); ax2.set_yticks([])
ax1.set_xlabel("Original Images")
ax2.set_xlabel(f"Adversarial Images {indices}")
fig.tight_layout()
fig.savefig(f"{fname} Adversarial Accuracy {np.sum(succeses)} out of {samples}", bbox_inches="tight", dpi=300)

Images Generated by the Code

Keras Version With Clip Values (Adv Acc 47/100) KerasVersion With ClipValues Adversarial Accuracy 47 out of 100

Keras Version Without Clip Values (Adv Acc 59/100) KerasVersion Without ClipValues Adversarial Accuracy 59 out of 100

Tensorflow Version With Clip Values (Adv Acc 9/100) TensorflowVersion With ClipValues Adversarial Accuracy 9 out of 100

Tensorflow Version With Clip Values (Adv Acc 26/100) TensorflowVersion Without ClipValues Adversarial Accuracy 26 out of 100

shashankkotyan commented 4 years ago

Edited Issue Comment to include more specific details about the testing environments.

beat-buesser commented 4 years ago

@shashankkotyan Thank you very much for the great example script! Sorry for the delay, but I think I have finally identified the reasons for your observations.

The current version of TensorFlowV2Classifier in ART 1.1.0 only supports SparseCrossEntropy because it calls the loss function with index labels. Unfortunately it does not warn or inform the user. This got fixed in commit 46eeb2ff0ec4aec6d07c6b799611b36fc84768bb which is already on branch dev_1.2.0 and will be published in ART 1.2.0 in a few weeks.

If I change the two loss function definitions in your script to use SparseCrossEntropy I observe identical success rates with ART v1.1.0 for all combinations reported above. Small variations in the success rates in the order of 1% can sometimes be observed, which could be caused by the numerics of different implementations in the external frameworks:

model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

and

... , loss_object=tf.keras.losses.SparseCategoricalCrossentropy(), ...

in the lines creating TensorFlowV2Classifier classifiers.

A few things that I have changed in your script:

This is your script with modifications that I have used for my experiments, please let me know if you can repeat the experiments, I hope it works:

import os, sys
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

import tensorflow as tf
tf.get_logger().setLevel("ERROR")

# ! Change of These Parameters ! 
keras_opt = True
clip_opt  = True
clip_values = (0, 255)

fname = ""
if keras_opt: fname = f"{fname}KerasVersion"
else:         fname = f"{fname}TensorflowVersion"

if clip_opt:  fname = f"{fname} With ClipValues"
else:         fname = f"{fname} Without ClipValues"

if keras_opt:
    from keras import datasets, initializers, layers, models, optimizers, regularizers, utils, backend, __version__
else:
    from tensorflow.keras import datasets, initializers, layers, models, optimizers, regularizers, utils, backend, __version__

import numpy as np

num_images   = {'train': 50000, 'test': 10000}

num_classes  = 10
dataset_name = 'Cifar10'
class_names  = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

mean = [125.307, 122.95, 113.865]
std  = [62.9932, 62.0887, 66.7048]

(raw_x_train, raw_y_train), (raw_x_test, raw_y_test) = datasets.cifar10.load_data()

raw_x_train = raw_x_train.astype('float32')
raw_x_test = raw_x_test.astype('float32')

img_rows, img_cols, img_channels = 32,32,3
stack_n = 5
weight_decay = 0.0001
optimizer = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)

def residual_block(img_input, out_channel, increase=False):
    if increase: stride = (2,2)
    else: stride = (1,1)
    x = img_input
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Conv2D(out_channel,kernel_size=(3,3),strides=stride,padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Conv2D(out_channel,kernel_size=(3,3),strides=(1,1),padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)
    if increase:
        projection = layers.Conv2D(out_channel, kernel_size=(1,1), strides=(2,2), padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(img_input)
        return layers.add([x, projection])
    else: return layers.add([img_input, x])

img_input = layers.Input(shape=(img_rows, img_cols, img_channels))

x = layers.Conv2D(filters=16,kernel_size=(3,3),strides=(1,1),padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(img_input)
for _ in range(stack_n):    x = residual_block(x, 16, False)
x = residual_block(x, 32, True)
for _ in range(1, stack_n): x = residual_block(x, 32, False)
x = residual_block(x, 64, True)
for _ in range(1, stack_n): x = residual_block(x, 64, False)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(10, name='Output', activation='softmax', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)

model = models.Model(img_input, x)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

model.load_weights(f"model_weights.h5")

from art import attacks, classifiers

if keras_opt:
    if clip_opt:
        classifier = classifiers.KerasClassifier(model, use_logits=False, channel_index=3, clip_values=clip_values, defences=None, preprocessing=(mean, std), input_layer=0, output_layer=0)
    else:
        classifier = classifiers.KerasClassifier(model, use_logits=False, channel_index=3, clip_values=None, defences=None, preprocessing=(mean, std), input_layer=0, output_layer=0)
else:
    if clip_opt:
        classifier = classifiers.TensorFlowV2Classifier(model, num_classes, (img_rows, img_cols, img_channels), loss_object=tf.keras.losses.SparseCategoricalCrossentropy(), train_step=None, channel_index=3, clip_values=clip_values, defences=None, preprocessing=(mean, std))
    else:
        classifier = classifiers.TensorFlowV2Classifier(model, num_classes, (img_rows, img_cols, img_channels), loss_object=tf.keras.losses.SparseCategoricalCrossentropy(), train_step=None, channel_index=3, clip_values=None, defences=None, preprocessing=(mean, std))

pred = np.argmax(classifier.predict(raw_x_test), axis=1)

# Accuracy of the model reamins 0.927 across the tests which confirms that the h5 file is not overwritten.
print(f"{np.sum(pred==raw_y_test[:,0])/len(raw_y_test):.3f}")

attacker = attacks.evasion.FastGradientMethod(classifier=classifier, norm=np.inf, targeted=False, eps=2)

def attack(x,y):
    x = np.expand_dims(x, axis=0)
    adv_x           = attacker.generate(x)
    prior_probs     = classifier.predict(x)[0]
    predicted_probs = classifier.predict(adv_x.astype(np.float32))[0]
    actual_class    = y # np.argmax(prior_probs)
    predicted_class = np.argmax(predicted_probs)
    success         = predicted_class != actual_class
    return adv_x, success

samples = 100
adv_xs  = []
succeses = []
for x, y in zip(raw_x_test[:samples], raw_y_test[:samples]):
    adv_x, success = attack(x, y)
    adv_xs   += [adv_x]
    succeses += [success]

print(np.sum(succeses))
shashankkotyan commented 4 years ago

@beat-buesser Thank you for your thorough explanation. I have checked your script and it is producing expected results. Thank you for your suggestions to make the script more crisp and concise.

A summary of adversarial accuracy on the first 100 samples on the modified script is

With Clip Value Without Clip Value
Keras Version 51/100 51/100
Tensorflow V2 Version 53/100 53/100

I agree there could be small variations across platforms but as they were huge in the earlier script, therefore I opened up this issue.

I would recommend you to mention the implementation of only SparseCategoricalCrossentropy for the TensorflowV2 module in the current documentation until ART 1.2.0 is released as it is not mentioned (or maybe I have missed it).

beat-buesser commented 4 years ago

@shashankkotyan Thank you very much for confirming the results and your suggestions!

beat-buesser commented 4 years ago

@shashankkotyan Thank you for your help! This should now be fixed with the release of ART 1.1.1.