Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
MIT License
4.88k stars 1.17k forks source link

What changed with DeepFool in 1.0.0? #192

Closed marcoancona closed 4 years ago

marcoancona commented 5 years ago

Describe the bug I have a simple Keras CNN with Softmax activation, trained on MNIST. These was the result with ART 0.10.0: image

And this with 1.0.1 image

As you can see, without any other change, on 1.0 the adversarial accuracy is nearly 0 but only because the input images are totally destroyed (not classifiable by a human). The result with ART 0.10 seems much more reasonable.

What has changed? If I go through the changelog, I see that it might be related to the following point:

Generalize the classifiers of TensorFlow, Keras, PyTorch, and MXNet by removing assumptions on their output (logits or probabilities). The Boolean parameter logits has been removed from Classifier API in methods predict and class_gradient. The predictions and gradients are now computed at the output of the model without any modifications. (#50, #75, #106, #115

This also confused me because use_logits is still available on KerasClassifier so I thought the behavior would have been the same.

System information (please complete the following information):

beat-buesser commented 5 years ago

Hi @marcoancona Thank you very much for using ART and raising this issue. We will look at it as soon as possible. In the meantime, would you have a script available to share with us that reproduces the described behaviour?

marcoancona commented 5 years ago

I will try to build a minimal example. Running your DeepFool Keras MNIST unit test and comparing the results of the two ART version should highlight the problem, as it is a very similar setup.

One suspicious line is https://github.com/IBM/adversarial-robustness-toolbox/blob/9c6ebc6567bb1533e3048973b02fd146cc1a73cc/art/classifiers/keras.py#L162 Seems like use_logits is ignored with categorical_crossentropy. Why?

Doing some other tests, I found that indeed the behavior is as expected if the model output are logits. How to handle the case where the model output are probabilities?

Also, even assuming I can change my model to output logits, how can I compile it? When I try to set a logit-based loss function (ie model.compile(loss=keras.losses.CategoricalCrossentropy(from_logits=True)), ART would complain that the loss is not recognized.

beat-buesser commented 5 years ago

Hi @marcoancona

Ok, is it correct that for logits as output the behaviour in both versions is as expected?

The example in this Stack Overflow post shows how to train a Keras model that predicts logits: https://stackoverflow.com/questions/47036409/keras-how-to-get-unnormalized-logits-instead-of-probabilities

Many attacks are much stronger on logits than on probabilities.

The breaking item introduced with version 1.0, cited above, is that all classifiers use the output that their model predicts (e.g. probabilities, logits, etc.) and do not try to internally find the logits anymore if a model provides probabilities (as it was the case with ART 0.x. This change was motivated because with the increasing diversity of models it became increasingly difficult to guarantee finding the correct logits and it masked the actual adversarial algorithm (e.g. attack running on probabilities or logits, etc.) which is important for the accuracy scientific experiments.

marcoancona commented 5 years ago

The example is Stack Overflow does not work (anymore?). As I mentioned before, even if I want to use logits, I can't find a way to pass the loss function in a way that is accepted by ART.

Please see the following example (inspired by the Stack Overflow answer):

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Dropout, Flatten, Lambda, Activation, AveragePooling2D
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K
import tensorflow as tf

print (f'Using Tensorflow {tf.__version__}')
print (f'Using Keras {keras.__version__}')

# input image dimensions
img_rows, img_cols = (28, 28)

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
x_train /= 255
x_test /= 255

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

model = Sequential()
model.add(Flatten(input_shape=input_shape))
model.add(Dense(10))  # < use logit output

def my_categorical_crossentropy(y_true, y_pred):
    return K.categorical_crossentropy(y_true, y_pred, from_logits=True)

model.compile(loss=my_categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=128,
          epochs=1,
          verbose=0,
)

from art.classifiers import KerasClassifier
from art.attacks.deepfool import DeepFool

classifier = KerasClassifier(
    model,
    use_logits=False,  # < does it make any difference?
    clip_values=(np.min(x_test), np.max(x_test))
)

attack = DeepFool(classifier)
attack.generate(x_test)

and the result:

Using Tensorflow 1.15.0
Using Keras 2.2.4-tf
  File "/usr/local/lib/python3.6/dist-packages/art/classifiers/keras.py", line 84, in __init__
    self._initialize_params(model, use_logits, input_layer, output_layer)
  File "/usr/local/lib/python3.6/dist-packages/art/classifiers/keras.py", line 144, in _initialize_params
    loss_function = getattr(k, self._model.loss.__name__)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/module_wrapper.py", line 193, in __getattr__
    attr = getattr(self._tfmw_wrapped_module, name)
AttributeError: module 'tensorflow.python.keras.api._v1.keras.backend' has no attribute 'my_categorical_crossentropy'

I should something like "categorical_crossentropy" for the loss of the Keras model, but then I cannot use logits as output.

beat-buesser commented 5 years ago

I have made the following changes to your script above and have tested it with TensorFlow 1.14:

def categorical_crossentropy(y_true, y_pred):
    return keras.losses.categorical_crossentropy(y_true, y_pred, from_logits=True, label_smoothing=0)

model.compile(loss=categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])

This should make your script run.

I think there might be a chance here to improve the usability of the Keras classifier. The code for the Keras classifier is currently checking for the name of the loss function and expects it to be one of the functions/names provided in Keras. In this case, the use_logits argument does not have any effect anymore. I think we might have to update the code and documentation to make this clearer.

marcoancona commented 5 years ago

This seems to work, thanks for your help. From a user perspective, I would suggest the following improvements:

beat-buesser commented 5 years ago

Thank you for your feedback! I agree about use_logits and the loss generators for KerasClassifier, we should definitely support them. I think this should be straightforward to implement and will be included in the next release 1.1.0, let's keep this issue open until then.

We could also extend the example by showing a case with probabilities and logits and compare their adversarial effectiveness.