Difference in adversarial images and adversarial accuracy across different platforms (Keras Classifier and TensorflowV2 Classifier).

Bug Description I notice a change in adversarial accuracy and adversarial image when I shift from KerasClassifier to TensorFlowV2Classifier.

Results I have for Fast Gradient Sign Method. I attacked the first 100 images in the Cifar-10 Test Dataset. The values in [*] show number of images attacked for each class KC: [ 6,6,4,8,4,8,12,10,9,10 ] = 77 images successfully attacked out of 100 images. TF: [ 1,2,2,2,3,4,0,4,4,4 ] = 26 images successfully attacked out of 100 images.

To Reproduce Keras Classifier is loaded the by the following line, classifier = art.classifiers.KerasClassifier(model)

Tensorflow Classifier is loaded by the following line, classifier = art.classifiers.TensorFlowV2Classifier(model, 10, (32, 32, 3), loss_object= tf.keras.losses.CategoricalCrossentropy())

Attack Function is initialised by the following line, attack = art.attacks.FastGradientMethod(classifier=classifier)

Expected behaviour I expected the adversarial accuracy and adversarial images generated by the attacks to be same but they differ across different platforms.

Screenshots Keras Adversarial Image Keras_1_1_0

Tesnorflow Adversarial Image Tensorflow_1_1_0

System information For Keras Test on Python 3.6.10

adversarial_robustness_toolbox=1.1.0
art=4.5
Keras=2.2.3
matplotlib=3.1.3
numpy=1.18.1
tensorflow=1.13.1

System information For TensorflowV2 Test on Python 3.7.4

adversarial_robustness_toolbox=1.1.0
art=4.5
Keras=2.2.4
matplotlib=3.1.1
numpy=1.17.2
tensorflow=2.1.0

All the tests were performed on Ubuntu 18.04.3 LTS (bionic).

Hi @shashankkotyan Thank you very much for using ART and raising this issue! To get a better understanding, could you please provide additional information about your model (structure, loss function, etc.)?

Thank you for your prompt reply @beat-buesser.

I have used vanilla ResNet architecture whose weights I load from a model weights (.h5) file. In both the experiments, Keras and TensorflowV2, I use the same .h5 file. The only difference is using Keras Module in the first experiment and TensorflowV2 Module in the second experiment.

Code Snippet:

In Keras experiment from keras import initializers, layers, models, optimizers, regularizers

In TensorflowV2 experiment from tensorflow.keras import initializers, layers, models, optimizers, regularizers

img_rows, img_cols, img_channels = 32,32,3
stack_n = 5
weight_decay = 0.0001
optimizer = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)

def residual_block(img_input, out_channel, increase=False):
    if increase: stride = (2,2)
    else: stride = (1,1)
    x = img_input
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Conv2D(out_channel,kernel_size=(3,3),strides=stride,padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Conv2D(out_channel,kernel_size=(3,3),strides=(1,1),padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)
    if increase:
        projection = layers.Conv2D(out_channel, kernel_size=(1,1), strides=(2,2), padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(img_input)
        return layers.add([x, projection])
    else: return layers.add([img_input, x])

img_input = layers.Input(shape=(img_rows, img_cols, img_channels))

x = layers.Conv2D(filters=16,kernel_size=(3,3),strides=(1,1),padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(img_input)
for _ in range(stack_n):    x = residual_block(x, 16, False)
x = residual_block(x, 32, True)
for _ in range(1, stack_n): x = residual_block(x, 32, False)
x = residual_block(x, 64, True)
for _ in range(1, stack_n): x = residual_block(x, 64, False)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(10, name='Output', activation='softmax', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)

model = models.Model(img_input, x)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

model.load_weights(f"model_weights.h5")

@shashankkotyan I have used your code above to train a model with Keras 2.3.1 and have tested it after loading from h5-file with KerasClassifier and TensorFlowV2Classifer with TensorFlow 2.1. I have received identical adversarial examples in both cases using FastGradientMethod. I have also checked that the loss gradients provided by the two classifiers are identical. So far I'm not able to reproduce your observation.

Can you confirm that you are using the same h5-file for both of your experiments (it could have been overwritten)? Could you provide a single example script that returns the difference that you have observed? I recommend setting clip_values=(0,1) or clip_values=(0,255) for the ART classifiers, this makes sure that the pixels of the adversarial example stay in the valid range.

@beat-buesser I have checked the h5 file and it remains the same across my tests. It has an accuracy of 92.7% on Cifar-10 Test Dataset given by the sample script below. I have also checked the setting of clip values. In my case as I give a preprocessed image to the attacked, therefore, I use clip_values=(0,1)

A summary of adversarial accuracy on the first 100 samples is

	With Clip Value	Without Clip Value
Keras Version	47/100	59/100
Tensorflow V2 Version	9/100	26/100

If possible, can you also check the adversarial accuracy across multiple images?

Also, just to mention I use two environments to reproduce, One has Tensorflow 2.1.0, Other has Tensorflow 1.13.1 and Keras 2.2.3

The h5 file I use for the tests. model_weights.zip

Code Example to Reproduce

import os, sys
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

import tensorflow as tf
tf.get_logger().setLevel("ERROR")

# ! Change of These Parameters ! 
keras_opt = True
clip_opt  = False

fname = ""
if keras_opt: fname = f"{fname}KerasVersion"
else:         fname = f"{fname}TensorflowVersion"

if clip_opt:  fname = f"{fname} With ClipValues"
else:         fname = f"{fname} Without ClipValues"

if keras_opt:
    from keras import datasets, initializers, layers, models, optimizers, regularizers
else:
    from tensorflow.keras import datasets, initializers, layers, models, optimizers, regularizers 

import numpy as np

num_images   = {'train': 50000, 'test': 10000}

num_classes  = 10
dataset_name = 'Cifar10'
class_names  = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

mean = [125.307, 122.95, 113.865]
std  = [62.9932, 62.0887, 66.7048]

(raw_x_train, raw_y_train), (raw_x_test, raw_y_test) = datasets.cifar10.load_data()
raw_y_train, raw_y_test = raw_y_train[:,0], raw_y_test[:,0]

def color_preprocess(imgs):
    if imgs.ndim < 4: imgs = np.array([imgs])
    imgs = imgs.astype('float32')
    for i in range(3): imgs[:,:,:,i] = (imgs[:,:,:,i] - mean[i]) / std[i]
    return imgs

def color_postprocess(imgs):
    if imgs.ndim < 4: imgs = np.array([imgs])
    imgs = imgs.astype('float32')
    for i in range(3): imgs[:,:,:,i] = (imgs[:,:,:,i] * std[i]) + mean[i]
    return imgs.astype(int)

img_rows, img_cols, img_channels = 32,32,3
stack_n = 5
weight_decay = 0.0001
optimizer = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)

def residual_block(img_input, out_channel, increase=False):
    if increase: stride = (2,2)
    else: stride = (1,1)
    x = img_input
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Conv2D(out_channel,kernel_size=(3,3),strides=stride,padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Conv2D(out_channel,kernel_size=(3,3),strides=(1,1),padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)
    if increase:
        projection = layers.Conv2D(out_channel, kernel_size=(1,1), strides=(2,2), padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(img_input)
        return layers.add([x, projection])
    else: return layers.add([img_input, x])

img_input = layers.Input(shape=(img_rows, img_cols, img_channels))

x = layers.Conv2D(filters=16,kernel_size=(3,3),strides=(1,1),padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(img_input)
for _ in range(stack_n):    x = residual_block(x, 16, False)
x = residual_block(x, 32, True)
for _ in range(1, stack_n): x = residual_block(x, 32, False)
x = residual_block(x, 64, True)
for _ in range(1, stack_n): x = residual_block(x, 64, False)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(num_classes, name='Output', activation='softmax', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)

model = models.Model(img_input, x)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

model.load_weights(f"model_weights.h5")

pred = np.argmax(model.predict(color_preprocess(raw_x_test)), axis=1)

# Accuracy of the model remains 0.927 across the tests which confirms that the h5 file is not overwritten.
print(f"{np.sum(pred==raw_y_test)/len(raw_y_test):.3f}")

from art import attacks, classifiers

if keras_opt:
    if clip_opt:
        classifier = classifiers.KerasClassifier(model, use_logits=False, channel_index=3, clip_values=(0,1), defences=None, preprocessing=(0, 1), input_layer=0, output_layer=0)
    else:
        classifier = classifiers.KerasClassifier(model, use_logits=False, channel_index=3, clip_values=None, defences=None, preprocessing=(0, 1), input_layer=0, output_layer=0)
else:
    if clip_opt:
        classifier = classifiers.TensorFlowV2Classifier(model, num_classes, (img_rows, img_cols, img_channels), loss_object=tf.keras.losses.CategoricalCrossentropy(), train_step=None, channel_index=3, clip_values=(0,1), defences=None, preprocessing=(0, 1))
    else:
        classifier = classifiers.TensorFlowV2Classifier(model, num_classes, (img_rows, img_cols, img_channels), loss_object=tf.keras.losses.CategoricalCrossentropy(), train_step=None, channel_index=3, clip_values=None, defences=None, preprocessing=(0, 1))

attacker = attacks.evasion.FastGradientMethod(classifier=classifier, norm=np.inf, targeted=False, eps=0.3)

def attack(x,y):
    adv_x           = color_postprocess(attacker.generate(color_preprocess(x)))[0]
    prior_probs     = model.predict(color_preprocess(x))[0]
    predicted_probs = model.predict(color_preprocess(adv_x))[0]
    actual_class    = y # np.argmax(prior_probs)
    predicted_class = np.argmax(predicted_probs)
    success         = predicted_class != actual_class
    return adv_x, success

samples = 100
adv_xs  = []
succeses = []
for x, y in zip(raw_x_test[:samples], raw_y_test[:samples]):
    adv_x, success = attack(x, y)
    adv_xs   += [adv_x]
    succeses += [success]

grid = np.sqrt(samples).astype(int)

original = raw_x_test[:samples]
original = original.reshape(grid, grid, img_rows, img_cols, img_channels).swapaxes(1, 2).reshape(grid*img_rows, grid*img_cols, img_channels)
adversarial = np.array(adv_xs)
adversarial =adversarial.reshape(grid, grid, img_rows, img_cols, img_channels).swapaxes(1, 2).reshape(grid*img_rows, grid*img_cols, img_channels)

indices = np.where(np.array(succeses) == True)[0]

from matplotlib import pyplot as plt

fig        = plt.figure(1, figsize=(20,10), dpi=300)
(ax1, ax2) = fig.subplots(1,2)
ax1.imshow(original.astype(int))
ax2.imshow(adversarial.astype(int))
ax1.set_xticks([]); ax2.set_xticks([])
ax1.set_yticks([]); ax2.set_yticks([])
ax1.set_xlabel("Original Images")
ax2.set_xlabel(f"Adversarial Images {indices}")
fig.tight_layout()
fig.savefig(f"{fname} Adversarial Accuracy {np.sum(succeses)} out of {samples}", bbox_inches="tight", dpi=300)

Images Generated by the Code

Keras Version With Clip Values (Adv Acc 47/100) KerasVersion With ClipValues Adversarial Accuracy 47 out of 100

Keras Version Without Clip Values (Adv Acc 59/100) KerasVersion Without ClipValues Adversarial Accuracy 59 out of 100

Tensorflow Version With Clip Values (Adv Acc 9/100) TensorflowVersion With ClipValues Adversarial Accuracy 9 out of 100

Tensorflow Version With Clip Values (Adv Acc 26/100) TensorflowVersion Without ClipValues Adversarial Accuracy 26 out of 100

Edited Issue Comment to include more specific details about the testing environments.

@shashankkotyan Thank you very much for the great example script! Sorry for the delay, but I think I have finally identified the reasons for your observations.

The current version of TensorFlowV2Classifier in ART 1.1.0 only supports SparseCrossEntropy because it calls the loss function with index labels. Unfortunately it does not warn or inform the user. This got fixed in commit 46eeb2ff0ec4aec6d07c6b799611b36fc84768bb which is already on branch dev_1.2.0 and will be published in ART 1.2.0 in a few weeks.

If I change the two loss function definitions in your script to use SparseCrossEntropy I observe identical success rates with ART v1.1.0 for all combinations reported above. Small variations in the success rates in the order of 1% can sometimes be observed, which could be caused by the numerics of different implementations in the external frameworks:

model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

and

... , loss_object=tf.keras.losses.SparseCategoricalCrossentropy(), ...

in the lines creating TensorFlowV2Classifier classifiers.

A few things that I have changed in your script:

ART requires raw images to functions like attack.generate and classifier.predict, most likely in [0, 1] or [0, 255] range and define the clip_values to (0,1) or (0,255) respectively
following the item above, instead of using your functioncolor_preprocess define the argument preprocessing in the classifiers with a tuple of (mean, std). ART uses these values internally to scale the gradients correctly and evaluate the model in certain attacks. mean and std can be sequences or arrays which will be broadcasted onto the input/image data.
the attack budget eps scales with the pixel range of the raw images, eps=0.1 in range [0, 1] is the same as eps=25.5 in range [0, 255]

This is your script with modifications that I have used for my experiments, please let me know if you can repeat the experiments, I hope it works:

import os, sys
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

import tensorflow as tf
tf.get_logger().setLevel("ERROR")

# ! Change of These Parameters ! 
keras_opt = True
clip_opt  = True
clip_values = (0, 255)

fname = ""
if keras_opt: fname = f"{fname}KerasVersion"
else:         fname = f"{fname}TensorflowVersion"

if clip_opt:  fname = f"{fname} With ClipValues"
else:         fname = f"{fname} Without ClipValues"

if keras_opt:
    from keras import datasets, initializers, layers, models, optimizers, regularizers, utils, backend, __version__
else:
    from tensorflow.keras import datasets, initializers, layers, models, optimizers, regularizers, utils, backend, __version__

import numpy as np

num_images   = {'train': 50000, 'test': 10000}

num_classes  = 10
dataset_name = 'Cifar10'
class_names  = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

mean = [125.307, 122.95, 113.865]
std  = [62.9932, 62.0887, 66.7048]

(raw_x_train, raw_y_train), (raw_x_test, raw_y_test) = datasets.cifar10.load_data()

raw_x_train = raw_x_train.astype('float32')
raw_x_test = raw_x_test.astype('float32')

img_rows, img_cols, img_channels = 32,32,3
stack_n = 5
weight_decay = 0.0001
optimizer = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)

def residual_block(img_input, out_channel, increase=False):
    if increase: stride = (2,2)
    else: stride = (1,1)
    x = img_input
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Conv2D(out_channel,kernel_size=(3,3),strides=stride,padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Conv2D(out_channel,kernel_size=(3,3),strides=(1,1),padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)
    if increase:
        projection = layers.Conv2D(out_channel, kernel_size=(1,1), strides=(2,2), padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(img_input)
        return layers.add([x, projection])
    else: return layers.add([img_input, x])

img_input = layers.Input(shape=(img_rows, img_cols, img_channels))

x = layers.Conv2D(filters=16,kernel_size=(3,3),strides=(1,1),padding='same', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(img_input)
for _ in range(stack_n):    x = residual_block(x, 16, False)
x = residual_block(x, 32, True)
for _ in range(1, stack_n): x = residual_block(x, 32, False)
x = residual_block(x, 64, True)
for _ in range(1, stack_n): x = residual_block(x, 64, False)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(10, name='Output', activation='softmax', kernel_initializer=initializers.he_normal(), kernel_regularizer=regularizers.l2(weight_decay))(x)

model = models.Model(img_input, x)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

model.load_weights(f"model_weights.h5")

from art import attacks, classifiers

if keras_opt:
    if clip_opt:
        classifier = classifiers.KerasClassifier(model, use_logits=False, channel_index=3, clip_values=clip_values, defences=None, preprocessing=(mean, std), input_layer=0, output_layer=0)
    else:
        classifier = classifiers.KerasClassifier(model, use_logits=False, channel_index=3, clip_values=None, defences=None, preprocessing=(mean, std), input_layer=0, output_layer=0)
else:
    if clip_opt:
        classifier = classifiers.TensorFlowV2Classifier(model, num_classes, (img_rows, img_cols, img_channels), loss_object=tf.keras.losses.SparseCategoricalCrossentropy(), train_step=None, channel_index=3, clip_values=clip_values, defences=None, preprocessing=(mean, std))
    else:
        classifier = classifiers.TensorFlowV2Classifier(model, num_classes, (img_rows, img_cols, img_channels), loss_object=tf.keras.losses.SparseCategoricalCrossentropy(), train_step=None, channel_index=3, clip_values=None, defences=None, preprocessing=(mean, std))

pred = np.argmax(classifier.predict(raw_x_test), axis=1)

# Accuracy of the model reamins 0.927 across the tests which confirms that the h5 file is not overwritten.
print(f"{np.sum(pred==raw_y_test[:,0])/len(raw_y_test):.3f}")

attacker = attacks.evasion.FastGradientMethod(classifier=classifier, norm=np.inf, targeted=False, eps=2)

def attack(x,y):
    x = np.expand_dims(x, axis=0)
    adv_x           = attacker.generate(x)
    prior_probs     = classifier.predict(x)[0]
    predicted_probs = classifier.predict(adv_x.astype(np.float32))[0]
    actual_class    = y # np.argmax(prior_probs)
    predicted_class = np.argmax(predicted_probs)
    success         = predicted_class != actual_class
    return adv_x, success

samples = 100
adv_xs  = []
succeses = []
for x, y in zip(raw_x_test[:samples], raw_y_test[:samples]):
    adv_x, success = attack(x, y)
    adv_xs   += [adv_x]
    succeses += [success]

print(np.sum(succeses))

@beat-buesser Thank you for your thorough explanation. I have checked your script and it is producing expected results. Thank you for your suggestions to make the script more crisp and concise.

A summary of adversarial accuracy on the first 100 samples on the modified script is

	With Clip Value	Without Clip Value
Keras Version	51/100	51/100
Tensorflow V2 Version	53/100	53/100

I agree there could be small variations across platforms but as they were huge in the earlier script, therefore I opened up this issue.

I would recommend you to mention the implementation of only SparseCategoricalCrossentropy for the TensorflowV2 module in the current documentation until ART 1.2.0 is released as it is not mentioned (or maybe I have missed it).

@shashankkotyan Thank you very much for confirming the results and your suggestions!

@shashankkotyan Thank you for your help! This should now be fixed with the release of ART 1.1.1.

Trusted-AI / adversarial-robustness-toolbox

Difference in adversarial images and adversarial accuracy across different platforms (Keras Classifier and TensorflowV2 Classifier). #279

Images Generated by the Code