bethgelab / foolbox

A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX
https://foolbox.jonasrauber.de
MIT License
2.77k stars 427 forks source link

CarliniWagnerL2Attack does not work on MNIST? #255

Closed peck94 closed 5 years ago

peck94 commented 5 years ago

Hello,

I suspect I may have encountered a bug in the implementation of the Carlini-Wagner L2 attack. Consider the following code:

import numpy as np
import keras
import foolbox
from keras.layers import Conv2D, MaxPooling2D, Dropout, BatchNormalization, Flatten, Dense, Input
from keras.models import Model
from keras.datasets import mnist
from foolbox.models import KerasModel
from foolbox.attacks import CarliniWagnerL2Attack, DeepFoolAttack
from foolbox.criteria import Misclassification
from foolbox.distances import MSE

NUM_CLASSES = 10
TRIGGER_BUG = True   # set this to True to use CarliniWagnerL2Attack,
                     # which triggers the bug, or False for DeepFoolAttack,
                     # which works perfectly.

def load_datasets():
    # the data, split between train and test sets
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train.reshape(*x_train.shape, 1), x_test.reshape(*x_test.shape, 1)

    # rescale data
    m, M = x_train.min(), x_train.max()

    x_train = (x_train - m) / (M - m)
    x_test = (x_test - m) / (M - m)

    # convert class vectors to binary class matrices
    y_train = keras.utils.to_categorical(y_train, NUM_CLASSES)
    y_test = keras.utils.to_categorical(y_test, NUM_CLASSES)

    return x_train.astype(np.float32), y_train.astype(np.int32), \
            x_test.astype(np.float32), y_test.astype(np.int32)

def create_model():
    # construct the model
    input_img = Input(shape=(28, 28, 1))

    x = Conv2D(32, kernel_size=(5, 5), activation='relu', input_shape=(28, 28, 1))(input_img)
    x = MaxPooling2D(pool_size=(2, 2))(x)
    x = BatchNormalization()(x)
    x = Dropout(.3)(x)
    x = Conv2D(64, (5, 5), activation='relu')(x)
    x = MaxPooling2D(pool_size=(2, 2))(x)
    x = BatchNormalization()(x)
    x = Dropout(.3)(x)
    x = Flatten()(x)
    x = Dense(1024, activation='relu')(x)
    x = Dense(NUM_CLASSES)(x)

    return Model(input_img, x)

if __name__ == '__main__':
    # model and data
    model = create_model()
    x_train, y_train, x_test, y_test = load_datasets()

    # generate adversarials
    fmodel = KerasModel(model, bounds=(0, 1), predicts='logits')
    Attack = CarliniWagnerL2Attack if TRIGGER_BUG else DeepFoolAttack
    attack = Attack(fmodel, Misclassification(), MSE)
    ys = model.predict(x_test)
    for i, (x, y) in enumerate(zip(x_test, ys)):
        print('Sample {}/{}...'.format(i+1, x_test.shape[0]), end='\r')
        adversarial = attack(x, y.argmax())
    print()

This just creates a Keras model for MNIST and attempts to generate adversarials using either CarliniWagnerL2Attack or DeepFoolAttack. When I run this code with DeepFool, everything seems to work perfectly. However, when I switch to CarliniWagnerL2, I get the following error:

Traceback (most recent call last):
  File "mwe.py", line 65, in <module>
    adversarial = attack(x, y.argmax())
  File "/home/jpeck/.local/lib/python3.5/site-packages/foolbox/attacks/base.py", line 137, in wrapper
    _ = call_fn(self, a, label=None, unpack=None, **kwargs)
  File "/home/jpeck/.local/lib/python3.5/site-packages/foolbox/attacks/carlini_wagner.py", line 149, in __call__
    confidence, min_, max_)
  File "/home/jpeck/.local/lib/python3.5/site-packages/foolbox/attacks/carlini_wagner.py", line 222, in loss_function
    is_adv_loss_grad = a.backward(logits_diff_grad, x)
  File "/home/jpeck/.local/lib/python3.5/site-packages/foolbox/adversarial.py", line 449, in backward
    gradient = self.__model.backward(gradient, image)
  File "/home/jpeck/.local/lib/python3.5/site-packages/foolbox/models/keras.py", line 178, in backward
    assert gradient.shape == image.shape
AssertionError

Upon further investigation, it seems image.shape is (28, 28, 1) as one would expect but gradient.shape is (10,) for some reason. This error only seems to occur on MNIST or Fashion-MNIST. On CIFAR-10 and SVHN data sets the CarliniWagnerL2Attack appears to work fine. My Python version is 3.5.2, Keras version is 2.2.4 and Foolbox is version 1.8.0.

Is this a bug or am I doing something wrong?

wielandbrendel commented 5 years ago

Dear @peck94 thanks for the bug report! There was indeed a small issue in the backward function of the Keras model through which the last axis was squeezed out. Should be fixed.

peck94 commented 5 years ago

Happy to help! fb05fb27bae5a4b2d2a1684a99c8eb1a300414a2 seems to fix it for me. Thank you!