This is code that implements Neural Fingerprinting, a technique to detect adversarial examples.
This accompanies the paper Detecting Adversarial Examples via Neural Fingerprinting, Sumanth Dathathri(*), Stephan Zheng(*), Richard Murray and Yisong Yue, 2018 (* = equal contribution), which can be found here:
https://arxiv.org/abs/1803.03870
If you use this code or work, please cite:
@inproceedings{dathathri_zheng_2018_neural_fingerprinting,
title = {Detecting Adversarial Examples via Neural Fingerprinting},
author={Dathathri, Sumanth and Zheng, Stephan and Murray, Richard and Yue, Yisong},
year = {2018}
eprint = {1803.03870}
ee = {https://arxiv.org/abs/1803.03870}
}
To clone the repository, run:
git clone https://github.com/StephanZheng/neural-fingerprinting
cd neural-fingerprinting
Neural Fingerprinting achieves near-perfect detection rates on MNIST, CIFAR and MiniImageNet-20.
ROC curves for detection of different attacks on CIFAR.
We have tested this codebase with the following dependencies (we cannot guarantee compatibility with other versions).
To install these dependencies, run:
# PyTorch: find detailed instructions on [http://pytorch.org/](http://pytorch.org/)
pip install torch
pip install torchvision
# TF: find detailed instructions on [http://tensorflow.org/](http://tensorflow.org)
pip install keras
pip install tensorflow-gpu
# nn_transfer
git clone https://github.com/gzuidhof/nn-transfer
cd nn-transfer
pip install .
pip install sklearn
This codebase relies on third-party implementations for adversarial attacks and code to transfer generated attacks from Tensorflow to PyTorch.
third_party
folder.To train and evaluate models with fingerprints, use the launcher script run.sh
, which contains example calls to run the code.
The flags that can be set for the launcher are:
./run.sh dataset train attack eval grid num_dx eps epoch_for_eval
where
For instance, the following command trains a convolutional neural network for MNIST with 10 fingerprints with epsilon = 0.1, and evaluates the model after 10 epochs of training:
./run.sh mnist train attack eval nogrid 10 0.1 10
NAME=mnist
LOGDIR=/tmp/nfp/$NAME/log
DATADIR=/tmp/nfp/$NAME/data
mkdir -p $LOGDIR
mkdir -p $DATADIR
NUMDX=10
EPS=0.1
NUM_EPOCHS=10
python $NAME/train_fingerprint.py \
--batch-size 128 \
--test-batch-size 128 \
--epochs $NUM_EPOCHS \
--lr 0.01 \
--momentum 0.9 \
--seed 0 \
--log-interval 10 \
--log-dir $LOGDIR \
--data-dir $DATADIR \
--eps=$EPS \
--num-dx=$NUMDX \
--num-class=10 \
--name=$NAME
ADV_EX_DIR=/tmp/nfp/$NAME/attacks
EPOCH=10
python $NAME/gen_whitebox_adv.py \
--attack "all" \
--ckpt $LOGDIR/ckpt/state_dict-ep_$EPOCH.pth \
--log-dir $ADV_EX_DIR \
--batch-size 128
EVAL_LOGDIR=$LOGDIR/eval/epoch_$EPOCH
mkdir -p $EVAL_LOGDIR
python $NAME/eval_fingerprint.py \
--batch-size 128 \
--epochs 100 \
--lr 0.001 \
--momentum 0.9 \
--seed 0 \
--log-interval 10 \
--ckpt $LOGDIR/ckpt/state_dict-ep_$EPOCH.pth \
--log-dir $EVAL_LOGDIR \
--fingerprint-dir $LOGDIR \
--adv-ex-dir $ADV_EX_DIR \
--data-dir $DATADIR \
--eps=$eps \
--num-dx=$numdx \
--num-class=10 \
--name=$NAME