Closed marcoancona closed 4 years ago
Hi @marcoancona Thank you very much for using ART and raising this issue. We will look at it as soon as possible. In the meantime, would you have a script available to share with us that reproduces the described behaviour?
I will try to build a minimal example. Running your DeepFool Keras MNIST unit test and comparing the results of the two ART version should highlight the problem, as it is a very similar setup.
One suspicious line is https://github.com/IBM/adversarial-robustness-toolbox/blob/9c6ebc6567bb1533e3048973b02fd146cc1a73cc/art/classifiers/keras.py#L162
Seems like use_logits
is ignored with categorical_crossentropy
. Why?
Doing some other tests, I found that indeed the behavior is as expected if the model output are logits. How to handle the case where the model output are probabilities?
Also, even assuming I can change my model to output logits, how can I compile it? When I try to set a logit-based loss function (ie model.compile(loss=keras.losses.CategoricalCrossentropy(from_logits=True)
), ART would complain that the loss is not recognized.
Hi @marcoancona
Ok, is it correct that for logits as output the behaviour in both versions is as expected?
The example in this Stack Overflow post shows how to train a Keras model that predicts logits: https://stackoverflow.com/questions/47036409/keras-how-to-get-unnormalized-logits-instead-of-probabilities
Many attacks are much stronger on logits than on probabilities.
The breaking item introduced with version 1.0, cited above, is that all classifiers use the output that their model predicts (e.g. probabilities, logits, etc.) and do not try to internally find the logits anymore if a model provides probabilities (as it was the case with ART 0.x. This change was motivated because with the increasing diversity of models it became increasingly difficult to guarantee finding the correct logits and it masked the actual adversarial algorithm (e.g. attack running on probabilities or logits, etc.) which is important for the accuracy scientific experiments.
The example is Stack Overflow does not work (anymore?). As I mentioned before, even if I want to use logits, I can't find a way to pass the loss function in a way that is accepted by ART.
Please see the following example (inspired by the Stack Overflow answer):
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Dropout, Flatten, Lambda, Activation, AveragePooling2D
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K
import tensorflow as tf
print (f'Using Tensorflow {tf.__version__}')
print (f'Using Keras {keras.__version__}')
# input image dimensions
img_rows, img_cols = (28, 28)
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
x_train /= 255
x_test /= 255
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)
model = Sequential()
model.add(Flatten(input_shape=input_shape))
model.add(Dense(10)) # < use logit output
def my_categorical_crossentropy(y_true, y_pred):
return K.categorical_crossentropy(y_true, y_pred, from_logits=True)
model.compile(loss=my_categorical_crossentropy,
optimizer=keras.optimizers.Adam(),
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=128,
epochs=1,
verbose=0,
)
from art.classifiers import KerasClassifier
from art.attacks.deepfool import DeepFool
classifier = KerasClassifier(
model,
use_logits=False, # < does it make any difference?
clip_values=(np.min(x_test), np.max(x_test))
)
attack = DeepFool(classifier)
attack.generate(x_test)
and the result:
Using Tensorflow 1.15.0
Using Keras 2.2.4-tf
File "/usr/local/lib/python3.6/dist-packages/art/classifiers/keras.py", line 84, in __init__
self._initialize_params(model, use_logits, input_layer, output_layer)
File "/usr/local/lib/python3.6/dist-packages/art/classifiers/keras.py", line 144, in _initialize_params
loss_function = getattr(k, self._model.loss.__name__)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/module_wrapper.py", line 193, in __getattr__
attr = getattr(self._tfmw_wrapped_module, name)
AttributeError: module 'tensorflow.python.keras.api._v1.keras.backend' has no attribute 'my_categorical_crossentropy'
I should something like "categorical_crossentropy"
for the loss
of the Keras model, but then I cannot use logits as output.
I have made the following changes to your script above and have tested it with TensorFlow 1.14:
def categorical_crossentropy(y_true, y_pred):
return keras.losses.categorical_crossentropy(y_true, y_pred, from_logits=True, label_smoothing=0)
model.compile(loss=categorical_crossentropy,
optimizer=keras.optimizers.Adam(),
metrics=['accuracy'])
This should make your script run.
I think there might be a chance here to improve the usability of the Keras classifier. The code for the Keras classifier is currently checking for the name of the loss function and expects it to be one of the functions/names provided in Keras. In this case, the use_logits
argument does not have any effect anymore. I think we might have to update the code and documentation to make this clearer.
This seems to work, thanks for your help. From a user perspective, I would suggest the following improvements:
use_logits
from KerasClassifier. This is just too confusing and, worse, sometimes it is taken into account sometimes not.tf.keras.losses.CategoricalCrossentropy()
) to avoid the need to build a custom loss function in order to use logits (but this is probably part of https://github.com/IBM/adversarial-robustness-toolbox/issues/176)Thank you for your feedback! I agree about use_logits
and the loss generators for KerasClassifier
, we should definitely support them. I think this should be straightforward to implement and will be included in the next release 1.1.0, let's keep this issue open until then.
We could also extend the example by showing a case with probabilities and logits and compare their adversarial effectiveness.
Describe the bug I have a simple Keras CNN with Softmax activation, trained on MNIST. These was the result with ART 0.10.0:
And this with 1.0.1
As you can see, without any other change, on 1.0 the adversarial accuracy is nearly 0 but only because the input images are totally destroyed (not classifiable by a human). The result with ART 0.10 seems much more reasonable.
What has changed? If I go through the changelog, I see that it might be related to the following point:
This also confused me because
use_logits
is still available on KerasClassifier so I thought the behavior would have been the same.System information (please complete the following information):