Code for the 2021 ICCV paper, "On the Robustness of Vision Transformers to Adversarial Examples": Paper here with corresponding video https://youtu.be/pcYoymda49c.
We provide code for attacking a single Vision Transformer (ViT-L-16), a Big Transfer Model (BiT-M-R101x3) or a combination (ViT + BiT) defense. All attacks provided here are done on CIFAR-10 using PyTorch. With the proper parameter selection and models, this same code can also be easily re-tooled for CIFAR-100 and ImageNet. Each attack can be run by uncommenting one of the lines in the main.
We provide attack code for the Self-Attention Gradient Attack (SAGA), the Adaptive attack, and a wrapper for using the RayS attack (original RayS attack code here: https://github.com/uclaml/RayS)
We use the following software packages:
We provide the following models:
The models can be downloaded here.
The ViT or BiT-M models are necessary to run any of the attacks. The ViT-B-32 model from Google is only needed for the Adapative attack, as it is used as the starting synthetic model.
All our attacks are tested in Windows 10 with 12 GB GPU memory (Titan V GPU). The Adaptive attack has additional hardware requirements. To run this attack you need 128 GB RAM, and at least 200 GB of free hard disk space.
For questions or concerns please contact the author at: kaleel.mahmood@uconn.edu