Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
MIT License
4.78k stars 1.16k forks source link

Error in Hop skip jump attack #307

Closed akshayag closed 3 years ago

akshayag commented 4 years ago

Describe the bug A clear and concise description of what the bug is.

RuntimeWarning: invalid value encountered in true_divide result = grad / np.linalg.norm(grad)

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

System information (please complete the following information):

beat-buesser commented 4 years ago

Hi @akshayag Thank you for your interest in ART! Could you please provide a more detailed description of you issue?

akshayag commented 4 years ago

Hi, I was trying to run the Hop Skip Jump attack suing following specification: adv_crafter = HopSkipJump(classifier, targeted=False, norm=2, max_iter=10, max_eval=100, init_eval=10, init_size=10)

However, after running couple of iterations (or examples), at the end program gives the following error (I think division by zero)

/adversarial-robustness-toolbox-master/art/attacks/evasion/hop_skip_jump.py:455: RuntimeWarning: invalid value encountered in true_divide result = grad / np.linalg.norm(grad)

beat-buesser commented 4 years ago

Ok. Could you please provide additional information about classifier and about the classification task (e.g. model size, framework, training accuracy, etc.)?

I would like to learn if it is a code issue or just a numerical issue.

beat-buesser commented 4 years ago

Or would you have a short code example that reproduces your issue?

akshayag commented 4 years ago

Hi,

The code name is adversarial_training_cifar10.py which is provided in the examples folder.

akshayag commented 4 years ago

-- coding: utf-8 --

""" Trains a convolutional neural network on the CIFAR-10 dataset, then generated adversarial images using the DeepFool attack and retrains the network on the training set augmented with the adversarial images. """ from future import absolute_import, division, print_function, unicode_literals

import logging

from keras.models import Sequential from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Activation, Dropout import numpy as np

from art.attacks import DeepFool from art.attacks import FastGradientMethod from art.attacks import SaliencyMapMethod from art.attacks import UniversalPerturbation from art.attacks import ProjectedGradientDescent from art.attacks import BasicIterativeMethod from art.attacks import HopSkipJump from art.classifiers import KerasClassifier from art.utils import load_dataset import scipy.io as io

Configure a logger to capture ART outputs; these are printed in console and the level of detail is set to INFO

logger = logging.getLogger() logger.setLevel(logging.INFO) handler = logging.StreamHandler() formatter = logging.Formatter('[%(levelname)s] %(message)s') handler.setFormatter(formatter) logger.addHandler(handler)

Read CIFAR10 dataset

(x_train, y_train), (x_test, ytest), min, max_ = load_dataset(str('cifar10')) im_shape = x_train[0].shape

Create Keras convolutional neural network - basic architecture from Keras examples

Source here: https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py

model = Sequential() model.add(Conv2D(32, (3, 3), padding='same', input_shape=x_train.shape[1:])) model.add(Activation('relu')) model.add(Conv2D(32, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same')) model.add(Activation('relu')) model.add(Conv2D(64, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25))

model.add(Flatten()) model.add(Dense(512)) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(10)) model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Create classifier wrapper

classifier = KerasClassifier(model=model, clipvalues=(min, max_)) classifier.fit(x_train, y_train, nb_epochs=10, batch_size=128)

Craft adversarial samples

logger.info('Create DeepFool attack')

adv_crafter = DeepFool(classifier)

adv_crafter = BasicIterativeMethod(classifier, eps=0.05, eps_step=0.03, max_iter=10, targeted=False, batch_size=1)

adv_crafter = UniversalPerturbation(classifier, attacker='deepfool', attacker_params=None, delta=0.6, max_iter=100, eps=0.10, norm=np.inf)

adv_crafter = ProjectedGradientDescent(classifier, norm=np.inf, eps=0.05, eps_step=0.03, max_iter=10, targeted=False, num_random_init=0, batch_size=32)

adv_crafter = HopSkipJump(classifier, targeted=False, norm=2, max_iter=10, max_eval=100, init_eval=10, init_size=10)

beat-buesser commented 4 years ago

How do you call the generate method? Is it adv_crafter.generate(x_test)?

akshayag commented 4 years ago

Yes, I have used the generate method. It ran for multiple examples but failed at the end may be for one example.

logger.info('Craft attack on training examples') x_train_adv = adv_crafter.generate(x_train) logger.info('Craft attack test examples') x_test_adv = adv_crafter.generate(x_test)

On Mon, Feb 24, 2020 at 7:02 PM Beat Buesser notifications@github.com wrote:

Ho do you call the generate method? Is it adv_crafter.generate(x_test)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IBM/adversarial-robustness-toolbox/issues/307?email_source=notifications&email_token=AEWA42HV2WLKWIRSXCR3GOTRERUYXA5CNFSM4K2VL4XKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM2EDLQ#issuecomment-590627246, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEWA42C6YAYJFM6WYVWHIH3RERUYXANCNFSM4K2VL4XA .

-- Akshay Agarwal Ph.D. Scholar (PhD14001), Image Analysis and Biometrics, IIIT Delhi https://sites.google.com/iiitd.ac.in/agarwalakshay/home

beat-buesser commented 4 years ago

Thank you very much! I have ran generate, so far, for the first 2000 images of x_train and have not observed the reported error. Would it be possible for you to share an .h5-file of your Keras model so that I can repeat the experiment with the same weights? Do you know for which image of x_train the error occurs?

akshayag commented 4 years ago

I have used the same model which is used in the adversarial_training_cifar10.py. I have not saved the model, however, I will try to run the code again and see if it works or at which sample the error occurs. I will let you know.

Thanks for your quick response.

On Mon, Feb 24, 2020 at 7:56 PM Beat Buesser notifications@github.com wrote:

Thank you very much! I have ran generate, so far, for the first 2000 images of x_train and have not observed the reported error. Would it be possible for you to share an .h5-file of your Keras model so that I can repeat the experiment with the same weights? Do you know for which image of x_train the error occurs?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IBM/adversarial-robustness-toolbox/issues/307?email_source=notifications&email_token=AEWA42F3WWIC6F5WI7JZPJ3RER3GVA5CNFSM4K2VL4XKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM2HXEI#issuecomment-590642065, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEWA42GTOHEXTLXM5GV6LPDRER3GVANCNFSM4K2VL4XA .

-- Akshay Agarwal Ph.D. Scholar (PhD14001), Image Analysis and Biometrics, IIIT Delhi https://sites.google.com/iiitd.ac.in/agarwalakshay/home

keykholt commented 3 years ago

We've also run into an issue where result = grad / np.linalg.norm(grad) results in a 0/0 operation. This caused the resulting adversarial input to be NaN. In our case, we were performing an untargeted attack. We did a preliminary investigation and determined that this issue occurs when either:

  1. delta = 0. When delta = 0, this can occur when the binary search returns an input such that original_samples = current_samples. This occurred infrequently so it was hard to diagnose the reason.
  2. rnd_noise = 0. This occurred because eval_samples = current_samples. Even if all of the values that define eval_samples were non-zero, it would consistently return current_samples (even without clipping). Our guess is that when delta becomes too small (we tried with 1e-7), the addition gets clipped to 0 for some reason.

In general, the issue seems to be that delta gets too small.

keykholt commented 3 years ago

Further investigation suggests, we isolated our issue to _init_sample(). On line 307, if the random image is successful, the code runs a binary search to try to optimize the image. However, random_image, the optimized input, is not checked after returning from the binary search. Normally, this is not a problem, but for us random_image had the same prediction as the original image. Thus, the adversarial initial point and the original image are in the same class, which causes the attack to optimize the adversarial input to be the same as the original image. We further found that the issue was due to a prediction mismatch between our base pytorch model and the PytorchEstimator we wrapped the model in, but we are not sure as to the cause. We think that this isn't an issue with ART, but with our internal preprocessing as the issue only occurs when we enable it. Our pytorch model performs a random pixel drop (like the work in https://arxiv.org/abs/1905.00180)

In summary:

  1. The adv input returns as NaN because
  2. grad=0 which causes 0/0 operation on line 589 because
  3. rnd_noise=0 because
  4. Either delta=0 or delta is small (we saw the issue occur at 10^-7) because
  5. Random_image on line 341 has the same prediction as the original image because
  6. Maybe the estimator is not behaving as the user expects

If others encounter the same problem, maybe they might have the same problem as us.

keykholt commented 3 years ago

It appears that the randomness in the input preprocessing is indeed the case. Each time the attack queries the model for a prediction, the random drop changes and can drastically change the prediction. This breaks HSJ's assumption that a misclassified sample will remain misclassified after the binary search.

We could add a safety check after line 341, which checks random image, and then exits HSJ if the image is no longer misclassified. However, our issue seems like an edge case and adds unnecessary overhead.

I think a better thing to do is to add a check during the attack phase and ensure that the adv image is not NaN. The attack seems to continue even when the image is NaN, even though it seems meaningless to do so. The check wouldn't be that much overhead either as it would just involve checking the batch of adv_images for NaN and exiting. Even if only a portion of the images are NaN, it seems like there might be an issue with the model. A less drastic measure would be to just stop the attack for NaN adv inputs.

beat-buesser commented 3 years ago

Hi @keykholt Thank you very much for further investigating this issue and reporting solution approaches! We definitely want to include these improvement in the next release, I'll schedule it for 1.6.1.