Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
MIT License
4.63k stars 1.13k forks source link

Questions regarding adversarial trainer #724

Closed joelma1 closed 3 years ago

joelma1 commented 3 years ago

Hello!

I've been trying to adversarially train some medical DL classifiers lately and have been running into some questions.

1) when do you recommend using AdversarialTrainer vs MadryPGD? How should we determine how many epochs to train for? I've been getting poor adversarial training results on some of my classifiers and good ones on others. 2) For adversarial training, your code examples all showed loading a new uninitialized model. Can I load in an already trained model for adversarial training? 3) I've been trying to replicate https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/examples/adversarial_training_data_augmentation.py but with replacing the model with my own vgg16 model architecture. After training, I got poor training and testing accuracy for the new classifier (~10%). I've included my model architecture below.

Thank you for your help!

def build_model(input_shape): vgg_base = VGG16( include_top=False,

weights=None,

    weights='imagenet',
    input_tensor=Input(shape=input_shape),
    input_shape=input_shape,
    pooling=None
    )
x = vgg_base.output
dropout1 = tf.keras.layers.Dropout(0.5)
dropout2 = tf.keras.layers.Dropout(0.5)
x = dropout1(x)
x = Flatten(name='flatten')(x)
x = Dense(4096, activation='relu', name='fc1',kernel_constraint=max_norm(2), trainable = False

)(x) x = dropout2(x) x = Dense(1024, activation='relu', name='fc2', kernel_constraint=max_norm(2), trainable = False )(x) predictions = Dense(10, activation='softmax', name='predictions',trainable = False)(x) model = Model(inputs=vgg_base.input, outputs=predictions) opt = SGD(lr=0.002) model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"]) return model

beat-buesser commented 3 years ago

Hi @joelma1 I think these are all good questions and I hope my colleagues will join the discussion.

1.) It depends on what you'd like to do. MadryPGD is probably the most popular standard approach with many evolutions in the literature, AdversarialTrainer is similar but focuses on using one or more attacks (that can but don't have to be PGD) for generating the adversarial training examples. Not all models and data combinations are straightforward, some approaches for challenging cases of adversarial training include for example increasing the fraction of adversarial training examples on a schedule over the training epochs, there are many more adversarial training recipes in the literature.

2.) I think the codes/tools should allow you to load and use a trained model for adversarial training. You could try the fraction of adversarial training example schedule mentioned in 1.) to see if you can save training epochs that way.

3.) The linked example is for the CIFAR10 10-class classification task. I've noticed that your model is a binary classifier (loss='binary_crossentropy). Did you adapt the dataset for binary classification?

joelma1 commented 3 years ago

@beat-buesser Thank you for the advice! 3) I accidentally pasted the wrong model. I have updated my post with the correct model I used.

joelma1 commented 3 years ago

@beat-buesser I ran the code in https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/examples/adversarial_training_data_augmentation.py with no modifications.

I got these results: Accuracy test set: 10.00% Accuracy on original PGD adversarial samples: 10.00% Accuracy on new PGD adversarial samples: 10.00%

I'm using tensorflow 1.15.3 and python 2.7. Please advise thank you!

beat-buesser commented 3 years ago

@joelma1

The 10% accuracies mean your model doesn't learn anything.

I've noticed that you are loading Imagenet weights (weights='imagenet',), does the same also happen for random initialisation?

It looks like you are adding additional layers on top, I guess for transfer learning, but are freezing them (trainable = False). Are you loading pretrained weights into these layers? I think these layers will remain unchanged (with random weights) during the training later on in the example which might explain the 10%.

joelma1 commented 3 years ago

@beat-buesser For those results, I used the model in https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/examples/adversarial_training_data_augmentation.py so there were no imagenet weights loaded (I ran the code exactly as is and got 10% acc on everything)

Thanks for letting me know! I have changed the layers to be trainable

beat-buesser commented 3 years ago

@joelma1 Interesting, I somehow thought it was with your model. Let me try to run the original example.

joelma1 commented 3 years ago

@beat-buesser great thank you!

beat-buesser commented 3 years ago

Hi @joelma1 Sorry for the long delay. I think I have finally identified the reason for the unexpected results in this example. The attack used to generate the adversarial examples define in line 67 https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/3776d66ae3139afc280c7847e5600f115160659f/examples/adversarial_training_data_augmentation.py#L67 is defined with eps and eps_step in the range [0, 255], but the images in x_train and x_test are in range [0, 1]. Updating the attack in line 67 to

pgd = ProjectedGradientDescent(classifier, eps=8/255, eps_step=2/255, max_iter=10, num_random_init=20)

makes the example work as expected. I'll update the example accordingly.

beat-buesser commented 3 years ago

Fixed with 7cc6fe6171b92475b16f35a1d70fc429e49b12f2