On the Robustness of Vision Transformers

Code for the 2021 ICCV paper, "On the Robustness of Vision Transformers to Adversarial Examples": Paper here with corresponding video https://youtu.be/pcYoymda49c.

We provide code for attacking a single Vision Transformer (ViT-L-16), a Big Transfer Model (BiT-M-R101x3) or a combination (ViT + BiT) defense. All attacks provided here are done on CIFAR-10 using PyTorch. With the proper parameter selection and models, this same code can also be easily re-tooled for CIFAR-100 and ImageNet. Each attack can be run by uncommenting one of the lines in the main.

We provide attack code for the Self-Attention Gradient Attack (SAGA), the Adaptive attack, and a wrapper for using the RayS attack (original RayS attack code here: https://github.com/uclaml/RayS)

Step by Step Guide

Install the packages listed in the Software Installation Section (see below).
Download the models from the Google Drive link listed in the Models Section.
Move the Models folder into the directory ".\VisionTransformersRobustness\VisionTransformersRobustness"
Open the VisionTransformersRobustness.sln file in the Python IDE of your choice. Choose one of the attack lines and uncomment it. Run the main.

Software Installation

We use the following software packages:

pytorch==1.7.1
torchvision==0.8.2
numpy==1.19.2
opencv-python==4.5.1.48

Models

We provide the following models:

ViT-L-16
BiT-M-R101x3
Google's Pretrained ViT-B-32

The models can be downloaded here.

The ViT or BiT-M models are necessary to run any of the attacks. The ViT-B-32 model from Google is only needed for the Adapative attack, as it is used as the starting synthetic model.

System Requirements

All our attacks are tested in Windows 10 with 12 GB GPU memory (Titan V GPU). The Adaptive attack has additional hardware requirements. To run this attack you need 128 GB RAM, and at least 200 GB of free hard disk space.

Contact

For questions or concerns please contact the author at: kaleel.mahmood@uconn.edu

MetaMain / ViTRobust