Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
MIT License
4.87k stars 1.17k forks source link

feature_collision_attack.py generates poison instances that are NaN. #1252

Open jinglin80 opened 3 years ago

jinglin80 commented 3 years ago

I notice that sometimes feature collision attack generates poison instances that are NaN. The following is the simplified version of the code, and the parameters are lr = 0.03, similarity = 540, and decay = 0.7:

from __future__ import absolute_import, division, print_function, unicode_literals
import os, sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

import warnings
warnings.filterwarnings('ignore')

import tensorflow.keras.backend as k
import numpy as np
import tensorflow.compat.v1 as tf
tf.disable_eager_execution()
import tensorflow.compat.v1 as tf
tf.disable_eager_execution() 
from art.utils import load_dataset
from art.attacks.poisoning import FeatureCollisionAttack as fca
import time
import scipy
from tensorflow.keras.models import Model
from tensorflow.keras.models import load_model
from art.estimators.classification import KerasClassifier
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-decay_coeff', default = 0.5, type=float)
parser.add_argument('-similarity', default = 0.05, type=float)
parser.add_argument('-learning_rate', default = 0.05, type=float)
args = vars(parser.parse_args())
similarity = args['similarity']
lr = args['learning_rate']
decay = args['decay_coeff']
# parameter:
run = 10
target = 6 # horse
source = 1 # bird
directory = ''
# Load the dataset 
(x_train, y_train), (x_test, y_test), min_, max_ = load_dataset(str('stl10'))
# Convert one-hot label to integer indices
training_labels = np.argmax(y_train, axis =1) 
test_labels = np.argmax(y_test, axis =1 ) 
# Training set (size = 1000)
X_inp_tr = x_train[np.squeeze(np.logical_or(training_labels==target, training_labels==1))]
X_inp_tr = np.rot90(X_inp_tr, axes=(-2, -3)) # rotate clockwise by 90 degrees so that the orientation is the same as CIFAR-2 images
Y_train = training_labels[np.squeeze(np.logical_or(training_labels==target, training_labels==1))]
Y_train[Y_train==target]=0
Y_tr = tf.keras.utils.to_categorical(Y_train, num_classes=2).reshape(-1,2)
# Validation set
X_val = x_test[np.squeeze(np.logical_or(test_labels==target, test_labels==1))][0:100]
X_val = np.rot90(X_val, axes=(-2, -3))
Y_val_raw = test_labels[np.squeeze(np.logical_or(test_labels==target, test_labels==1))][0:100]
Y_val_raw[Y_val_raw==target]=0
Y_val = tf.keras.utils.to_categorical(Y_val_raw, num_classes=2).reshape(-1,2)
# Base set
X_base = x_test[np.squeeze(np.logical_or(test_labels==target, test_labels==1))][100:300]
X_base = np.rot90(X_base, axes=(-2, -3))
Y_base_raw = test_labels[np.squeeze(np.logical_or(test_labels==target, test_labels==1))][100:300]
Y_base_raw[Y_base_raw==target]=0
Y_base = tf.keras.utils.to_categorical(Y_base_raw, num_classes=2).reshape(-1,2)
# Test set (size = 1200)
X_inp_test = x_test[np.squeeze(np.logical_or(test_labels==target, test_labels==1))][300:]
X_inp_test = np.rot90(X_inp_test, axes=(-2, -3))
Y_test_raw = test_labels[np.squeeze(np.logical_or(test_labels==target, test_labels==1))][300:]
Y_test_raw[Y_test_raw==target]=0
Y_test = tf.keras.utils.to_categorical(Y_test_raw, num_classes=2).reshape(-1,2)

model = load_model('stl2_vgg_tf1.h5')
stl_model = Model(inputs=model.input, outputs=model.output)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
art_model = KerasClassifier(model=stl_model, clip_values=(0, 1))
early = tf.keras.callbacks.EarlyStopping(monitor='accuracy', min_delta=0, patience=10, verbose=1, mode='auto')
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='accuracy', factor=0.9, patience=5)
preds = np.argmax(art_model.predict(X_inp_test), axis = 1)
select = preds == Y_test_raw
x_target = X_inp_test[select][Y_test_raw[select] == 0][:run] 
base_col = X_base[Y_base_raw == 1][:run] 
''' Step 2: Perform targeted poison attack with
        Base class: bird
        Target class: airplane
Output:
    x_poi: collection of poison instances  
'''
poi = np.zeros((0, 96, 96, 3))
for test_instance in range(run):
    poison = fca(art_model, x_target[test_instance:test_instance+1], decay_coeff=decay, 
                 feature_layer='global_max_pooling2d', similarity_coeff = similarity, 
                 max_iter=200, verbose = False, learning_rate = lr)
    idx = 0
    while True:
        poi_temp, _ = poison.poison(base_col[idx:idx+1])
        if (idx == base_col.shape[0] - 1) or np.argmax(model.predict(poi_temp), axis = 1) == 0:
            break
        idx += 1
    poi = np.append(poi, poi_temp, axis =0)
    try:
        [scipy.linalg.norm(base_col[idx]-poi_temp[0]), scipy.linalg.norm(x_target[test_instance]-poi_temp[0]), scipy.linalg.norm(base_col[idx]-x_target[test_instance])]
    except:
        print('poi:', poi_temp)

The NaN output example:

poi: [[[[nan nan nan]
   [nan nan nan]
   [nan nan nan]
   ...

   [nan nan nan]
   [nan nan nan]
   [nan nan nan]]]]

The model used can be downloaded from here.

beat-buesser commented 3 years ago

Hi @jinglin80 Thank you very much for reporting this issue. Have you already found out what might lead to this behaviour?

jinglin80 commented 3 years ago

Hi @beat-buesser I think the problem is related to the forward_step function (lines 180-195). If you add a print statement (print(new_attack)) to line 144, you will see the NaN.

beat-buesser commented 3 years ago

Hi @jinglin80 Thank you very much!

@Yi-Zoey @Nathalie-B This issue is affecting one of the poisoning tools, what do you think?

beat-buesser commented 3 years ago

Hi @jinglin80 I have been able to reproduce your results. At the moment I don't think it is caused by a bug, it looks like a numerical overflow is caused by extremely large activation values for the poison candidate. It looks like you are attacking a global_max_pooling2d layer which might facilitate the overflow. Have you also observed the issue for other layers?