as791 / Adversarial-Example-Attack-and-Defense

This repository contains the implementation of three adversarial example attack methods FGSM, IFGSM, MI-FGSM and one Distillation as defense against all attacks using MNIST dataset.
121 stars 27 forks source link
adversarial-attacks adversarial-defense adversarial-examples attack defense distillation fgsm mi-fgsm pytorch-implementation temperature

Adversarial-Example-Attack-and-Defense

This repository contains the PyTorch implementation of the three non-target adversarial example attacks (white box) and one defense method as countermeasure to those attacks.

Attack

  1. Fast Gradient Sign Method(FGSM) - Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014b.
    def fgsm_attack(input,epsilon,data_grad):
    pert_out = input + epsilon*data_grad.sign()
    pert_out = torch.clamp(pert_out, 0, 1)
    return pert_out
  2. Iterative Fast Gradient Sign Method(I-FGSM) - A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
    def ifgsm_attack(input,epsilon,data_grad):
    iter = 10
    alpha = epsilon/iter
    pert_out = input
    for i in range(iter-1):
    pert_out = pert_out + alpha*data_grad.sign()
    pert_out = torch.clamp(pert_out, 0, 1)
    if torch.norm((pert_out-input),p=float('inf')) > epsilon:
      break
    return pert_out
  3. Momentum Iterative Fast Gradient Sign Method(MI-FGSM) - Y. Dong et al. Boosting Adversarial Attacks with Momentum. arXiv preprint arXiv:1710.06081, 2018.
    def mifgsm_attack(input,epsilon,data_grad):
    iter=10
    decay_factor=1.0
    pert_out = input
    alpha = epsilon/iter
    g=0
    for i in range(iter-1):
    g = decay_factor*g + data_grad/torch.norm(data_grad,p=1)
    pert_out = pert_out + alpha*torch.sign(g)
    pert_out = torch.clamp(pert_out, 0, 1)
    if torch.norm((pert_out-input),p=float('inf')) > epsilon:
      break
    return pert_out

Defense

  1. Defensive Distillation - Papernot, N., McDaniel, P., Wu, X., Jha, S., and Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. arXiv preprint arXiv:1511.04508, 2016b.

According to the paper, defensive distillation can be done by following procedure: 1) Train a network F on the given training set (X,Y) by setting the temperature of the softmax to T. 2) Compute the scores (after softmax) given by F(X) again and evaluate the scores at temperature T. 3) Train another network F'T using softmax at temperature T on the dataset with soft labels (X,F(X)). We refer the model FT as the distilled model. 4) Use the distilled network F'T with softmax at temperature 1, which is denoted as F'1 during prediction on test data Xtest(or adversarial examples).

Taken Temperature as 100 for training the NetF and NetF'.

Results

Test Accuracy during attacks

FGSM

I-FGSM

MI-FGSM

Test Accuracy during attack using defensive distillation

FGSM

I-FGSM

MI-FGSM

Sample Advesarial Examples

FGSM

I-FGSM

MI-FGSM