StephanZheng / neural-fingerprinting

BSD 3-Clause "New" or "Revised" License
29 stars 17 forks source link

Detecting Adversarial Examples via Neural Fingerprinting

frontpage_v2.png

This is code that implements Neural Fingerprinting, a technique to detect adversarial examples.

This accompanies the paper Detecting Adversarial Examples via Neural Fingerprinting, Sumanth Dathathri(*), Stephan Zheng(*), Richard Murray and Yisong Yue, 2018 (* = equal contribution), which can be found here:

https://arxiv.org/abs/1803.03870

If you use this code or work, please cite:

@inproceedings{dathathri_zheng_2018_neural_fingerprinting,
  title  = {Detecting Adversarial Examples via Neural Fingerprinting},
  author={Dathathri, Sumanth and Zheng, Stephan and Murray, Richard and Yue, Yisong},
  year   = {2018}
  eprint = {1803.03870}
  ee     = {https://arxiv.org/abs/1803.03870}
}

To clone the repository, run:

git clone https://github.com/StephanZheng/neural-fingerprinting
cd neural-fingerprinting

Results

Neural Fingerprinting achieves near-perfect detection rates on MNIST, CIFAR and MiniImageNet-20.

ROC-AUC scores

roc_cifar.png ROC curves for detection of different attacks on CIFAR.

Requirements and Installation

We have tested this codebase with the following dependencies (we cannot guarantee compatibility with other versions).

To install these dependencies, run:

# PyTorch: find detailed instructions on [http://pytorch.org/](http://pytorch.org/)
pip install torch
pip install torchvision

# TF: find detailed instructions on [http://tensorflow.org/](http://tensorflow.org)
pip install keras
pip install tensorflow-gpu

# nn_transfer
git clone https://github.com/gzuidhof/nn-transfer
cd nn-transfer
pip install .

pip install sklearn

This codebase relies on third-party implementations for adversarial attacks and code to transfer generated attacks from Tensorflow to PyTorch.

Quick-start

To train and evaluate models with fingerprints, use the launcher script run.sh, which contains example calls to run the code.

The flags that can be set for the launcher are:

./run.sh dataset train attack eval grid num_dx eps epoch_for_eval

where

For instance, the following command trains a convolutional neural network for MNIST with 10 fingerprints with epsilon = 0.1, and evaluates the model after 10 epochs of training:

./run.sh mnist train attack eval nogrid 10 0.1 10

Running training, attacks and evaluation

  1. To train a model with fingerprints:
NAME=mnist

LOGDIR=/tmp/nfp/$NAME/log
DATADIR=/tmp/nfp/$NAME/data
mkdir -p $LOGDIR
mkdir -p $DATADIR

NUMDX=10
EPS=0.1
NUM_EPOCHS=10

python $NAME/train_fingerprint.py \
--batch-size 128 \
--test-batch-size 128 \
--epochs $NUM_EPOCHS \
--lr 0.01 \
--momentum 0.9 \
--seed 0 \
--log-interval 10 \
--log-dir $LOGDIR \
--data-dir $DATADIR \
--eps=$EPS \
--num-dx=$NUMDX \
--num-class=10 \
--name=$NAME
  1. Creating adversarial attacks for the model after 10 epochs of training:
ADV_EX_DIR=/tmp/nfp/$NAME/attacks
EPOCH=10
python $NAME/gen_whitebox_adv.py \
--attack "all" \
--ckpt $LOGDIR/ckpt/state_dict-ep_$EPOCH.pth \
--log-dir $ADV_EX_DIR \
--batch-size 128
  1. Evaluating model
EVAL_LOGDIR=$LOGDIR/eval/epoch_$EPOCH
mkdir -p $EVAL_LOGDIR

python $NAME/eval_fingerprint.py \
--batch-size 128 \
--epochs 100 \
--lr 0.001 \
--momentum 0.9 \
--seed 0 \
--log-interval 10 \
--ckpt $LOGDIR/ckpt/state_dict-ep_$EPOCH.pth \
--log-dir $EVAL_LOGDIR \
--fingerprint-dir $LOGDIR \
--adv-ex-dir $ADV_EX_DIR \
--data-dir $DATADIR \
--eps=$eps \
--num-dx=$numdx \
--num-class=10 \
--name=$NAME