This is the implementation of Adversarial Neural Pruning with Latent Vulnerability Suppression.
Authors: Divyam Madaan, Jinwoo Shin, Sung Ju Hwang
Despite the remarkable performance of deep neural networks on various computer vision tasks, they are known to be susceptible to adversarial perturbations, which makes it challenging to deploy them in real-world safety-critical applications. In this paper, we conjecture that the leading cause of adversarial vulnerability is the distortion in the latent feature space, and provide methods to suppress them effectively. Explicitly, we define vulnerability for each latent feature and then propose a new loss for adversarial learning, Vulnerability Suppression (VS) loss, that aims to minimize the feature-level vulnerability during training. We further propose a Bayesian framework to prune features with high vulnerability to reduce both vulnerability and loss on adversarial samples. We validate our Adversarial Neural Pruning with Vulnerability Suppression (ANP-VS) method on multiple benchmark datasets, on which it not only obtains state-of-the-art adversarial robustness but also improves the performance on clean examples, using only a fraction of the parameters used by the full network. Further qualitative analysis suggests that the improvements come from the suppression of feature-level vulnerability.
Contribution of this work
Train base model
$ python -m ANP_VS.src.experiments.run --net lenet_conv --mode base --data mnist
Train ANP-VS
$ python -m ANP_VS.src.experiments.run --net lenet_conv --mode bbd --eps=0.3 --step_size=0.01 --adv_train=True --pgd_steps=20 --vulnerability=True --data mnist --n_epochs 200 --beta_weight=4 --lambda_weight=0.001
Train base model
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode base --data cifar10
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode base --data cifar100
Train ANP-VS
# CIFAR-10
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode bbd --eps=0.03 --step_size=0.007 --adv_train=True --pgd_steps=10 --vulnerability=True --data cifar10 --n_epochs 200 --beta_weight=2 --lambda_weight=0.0001
# CIFAR-100
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode bbd --eps=0.03 --step_size=0.007 --adv_train=True --pgd_steps=10 --vulnerability=True --data cifar100 --n_epochs 200 --beta_weight=1 --lambda_weight=0.1
# MNIST
$ python -m ANP_VS.src.experiments.run --net lenet_conv --mode bbd --eval_mode attack --white_box=True --attack_source base --eps 0.3 --step_size 0.01 --pgd_steps=40 --vulnerability_type=expected --data mnist --adv_train=True
# CIFAR-10
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode bbd --eval_mode attack --white_box=True --attack_source base --eps 0.03 --step_size 0.007 --pgd_steps=40 --vulnerability_type=expected --data cifar10 --adv_train=True
# CIFAR-100
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode bbd --eval_mode attack --white_box=True --attack_source base --eps 0.03 --step_size 0.007 --pgd_steps=40 --vulnerability_type=expected --data cifar100 --adv_train=True
For black-box attack, train a base adversarial training model and set --white_box=False
# MNIST
$ python -m ANP_VS.src.experiments.run --net lenet_conv --mode bbd --eval_mode test --data mnist --adv_train=True
# CIFAR-10
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode bbd --eval_mode test --data cifar10 --adv_train=True
# CIFAR-100
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode bbd --eval_mode test --data cifar100 --adv_train=True
The results in the main paper (average over five independent runs). The best results of adversarial baselines are highlighted in bold. š”(š”) indicates that the higher (lower) number is the better.
Model | Clean Acc. (š”) | White Box Adversarial acc. (š”) | Black Box Adversarial acc. (š”) | White Box Vulnerability (š”) | Black Box Vulnerability (š”) | Memory (š”) | Flops (š”) | Sparsity (š”) |
---|---|---|---|---|---|---|---|---|
Standard | 99.29 | 0.00 | 8.02 | 0.129 | 0.113 | 100.0 | 1.00 | 0.00 |
BP | 99.34 | 0.00 | 12.99 | 0.091 | 0.078 | 4.14 | 9.68 | 83.48 |
AT | 99.14 | 88.03 | 94.18 | 0.045 | 0.040 | 100.0 | 1.00 | 0.00 |
AT BNN | 99.16 | 88.44 | 94.87 | 0.364 | 0.199 | 200.0 | 0.50 | 0.00 |
Pretrained AT | 99.18 | 88.26 | 94.49 | 0.412 | 0.381 | 100.0 | 1.00 | 0.00 |
ADMM | 99.01 | 88.47 | 94.61 | 0.041 | 0.038 | 100.0 | 1.00 | 80.00 |
TRADES | 99.07 | 89.67 | 95.04 | 0.037 | 0.033 | 100.0 | 1.00 | 0.00 |
ANP_VS | 99.05 | 91.31 | 95.43 | 0.017 | 0.015 | 6.81 | 10.57 | 84.16 |
Model | Clean Acc. (š”) | White Box Adversarial acc. (š”) | Black Box Adversarial acc. (š”) | White Box Vulnerability (š”) | Black Box Vulnerability (š”) | Memory (š”) | Flops (š”) | Sparsity (š”) |
---|---|---|---|---|---|---|---|---|
Standard | 92.76 | 13.79 | 41.65 | 0.077 | 0.065 | 100.0 | 1.00 | 0.00 |
BP | 92.91 | 14.30 | 42.88 | 0.037 | 0.033 | 12.41 | 2.34 | 75.92 |
AT | 87.50 | 49.85 | 63.70 | 0.050 | 0.047 | 100.0 | 1.00 | 0.00 |
AT BNN | 86.69 | 51.87 | 64.92 | 0.267 | 0.238 | 200.0 | 0.50 | 0.00 |
Pretrained AT | 87.50 | 52.25 | 66.10 | 0.041 | 0.036 | 100.0 | 1.00 | 0.00 |
ADMM | 78.15 | 47.37 | 62.15 | 0.034 | 0.030 | 100.0 | 1.00 | 75.00 |
TRADES | 80.33 | 52.08 | 64.80 | 0.045 | 0.042 | 100.0 | 1.00 | 0.00 |
ANP-VS | 88.18 | 56.21 | 71.44 | 0.019 | 0.016 | 12.27 | 2.41 | 76.53 |
Model | Clean Acc. (š”) | White Box Adversarial acc. (š”) | Black Box Adversarial acc. (š”) | White Box Vulnerability (š”) | Black Box Vulnerability (š”) | Memory (š”) | Flops (š”) | Sparsity (š”) |
---|---|---|---|---|---|---|---|---|
Standard | 67.44 | 2.81 | 14.94 | 0.143 | 0.119 | 100.0 | 1.00 | 0.00 |
BP | 69.40 | 3.12 | 16.39 | 0.067 | 0.059 | 18.59 | 1.95 | 63.48 |
AT | 57.79 | 19.07 | 32.47 | 0.079 | 0.071 | 100.0 | 1.00 | 0.00 |
AT BNN | 53.75 | 19.40 | 30.38 | 0.446 | 0.385 | 200.0 | 0.50 | 0.00 |
Pretrained AT | 57.14 | 19.86 | 35.42 | 0.071 | 0.065 | 100.0 | 1.00 | 0.00 |
ADMM | 52.52 | 19.65 | 31.30 | 0.060 | 0.056 | 100.0 | 1.00 | 75.00 |
TRADES | 56.70 | 21.21 | 32.81 | 0.065 | 0.060 | 100.0 | 1.00 | 0.00 |
ANP-VS | 59.15 | 22.35 | 37.01 | 0.035 | 0.030 | 16.74 | 2.02 | 66.80 |
We'd love to accept your contributions to this project. Please feel free to open an issue, or submit a pull request as necessary. If you have implementations of this repository in other ML frameworks, please reach out so we may highlight them here.
The code is build upon OpenXAIProject/Variational_Dropouts
If you found the provided code useful, please cite our work.
@inproceedings{
madaan2020adversarial,
title={Adversarial Neural Pruning with Latent Vulnerability Suppression},
author={Divyam Madaan and Jinwoo Shin and Sung Ju Hwang},
booktitle={ICML},
year={2020}
}