MadryLab / blackbox-bandits

Code for "Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors"
https://arxiv.org/abs/1807.07978
MIT License
61 stars 14 forks source link

Prior Convictions: Black-box Adversarial Attacks with Bandits and Priors

This is the code for reproducing the paper "Prior Convictions: Black-box Adversarial Attacks with Bandits and Priors" (arxiv) to appear at ICLR 2019. The paper can be cited as follows:

@article{IEM2018PriorCB,
  title={Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors},
  author={Andrew Ilyas and Logan Engstrom and Aleksander Madry},
  journal={ICLR 2019},
  year={2018},
  url={https://arxiv.org/abs/1807.07978}
}

Results

Avg Queries Failure Rate Avg Queries on NES success
Method l-inf l-2 l-inf l-2 l-inf l-2
NES 1735 2938 22.2\% 34.4\% 1735 2938
Bandits[T] (ours) 1781 2690 11.6\% 30.4\% 1214 2421
Bandits[TD] (ours) 1117 1858 4.6\% 15.5\% 703 999

Reproducing the results

Requirements

The results can be reproduced (with the default hyperparameters) with the following command:

python main.py [--nes] [--tiling] --json-config [configs/l2.json | configs/linf.json | configs/linf-nes.json | configs/l2-nes.json]

You can run python main.py --help to see all of the available options/hyperparameters. Although the hyperparameters were tuned for Inception-v3, the attack can by run with the flag --classifier {inception_v3,resnet50,vgg16_bn} to attack other classifiers.