Towards Evaluating the Robustness of Visual State Space Models

Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar, Fahad Khan, and Salman Khan
MBZUAI, UAE.

Official PyTorch implementation

:fire: News

(September 17, 2024)
- Updated the report: Added results obtained on MambaVision family of models, along with model calibration results.
(June 14, 2024)
- Code for robust evaluation of models is released.

Abstract: Vision State Space Models (VSSMs), a novel architecture that combines the strengths of recurrent neural networks and latent variable models, have demonstrated remarkable performance in visual perception tasks by efficiently capturing long-range dependencies and modeling complex visual dynamics. However, their robustness under natural and adversarial perturbations remains a critical concern. In this work, we present a comprehensive evaluation of VSSMs' robustness under various perturbation scenarios, including occlusions, image structure, common corruptions, and adversarial attacks, and compare their performance to well-established architectures such as transformers and Convolutional Neural Networks. Furthermore, we investigate the resilience of VSSMs to object-background compositional changes on sophisticated benchmarks designed to test model performance in complex visual scenes. We also assess their robustness on object detection and segmentation tasks using corrupted datasets that mimic real-world scenarios. To gain a deeper understanding of VSSMs' adversarial robustness, we conduct a frequency analysis of adversarial attacks, evaluating their performance against low-frequency and high-frequency perturbations. Our findings highlight the strengths and limitations of VSSMs in handling complex visual corruptions, offering valuable insights for future research and improvements in this promising field.

1) Installation 2) Available Models 3) Robustness against Adversarial attacks

White box Attacks
White box Frequency Attacks
Transfer-based Black box Attacks 4) Robustness against Information Drop
Information Drop Along Scanning Lines
Random Patch Drop
Salient Patch Drop
Patch Shuffling 5) Robustness against ImageNet corruptions 6) Robustness evaluation for Object Detection 6) BibTeX 7) Contact 8) References

💿 Installation

conda create -n mamba_robust

conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r req.txt
cd kernels/selective_scan && pip install .

:star: Available Models

Model	Tiny	Small	Base
VMamba (v0)	`vssm_tiny_v0`	`vssm_small_v0`	`vssm_base_v0`
VMamba (v2)	`vssm_tiny_v2`	`vssm_small_v2`	`vssm_base_v2`
Vision Transformer	`vit_tiny_patch16_224`	`vit_small_patch16_224`	`vit_base_patch16_224`
Swin Transformer	`swin_tiny_patch4_window7_224`	`swin_small_patch4_window7_224`	`swin_base_patch4_window7_224`
ConvNext	`convnext_tiny`	`convnext_small`	`convnext_base`

ResNet: resnet18, resnet50

VGG: vgg16_bn, vgg19_bn

Download VMamba ImageNet pre-trained weights and put them in pretrained_weights folder.

Download pre-trained weights for object detectors (Link) and segmentation networks (Link).

🛡️ A. Robustness against Adversarial attacks

1. White box Attacks

For crafting adversarial examples using Fast Gradient Sign Method (FGSM) at perturbation budget of 8/255, run:

cd  classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name fgsm  --source_model_name <model_name> --epsilon 8

For crafting adversarial examples using Projected Gradient Descent (PGD) at perturbation budget of 8/255 with number of attacks steps equal to 20, run:

cd  classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name pgd  --source_model_name <model_name> --epsilon 8 --attack_steps 20

Other available attacks: bim, mifgsm, difgsm, tpgd, tifgsm, vmifgsm

The results will be saved in AdvExamples_results folder with the following structure: AdvExamples_results/pgd_eps_{eps}_steps_{step}/{source_model_name}/accuracy.txt

2. White box Frequency Attacks

Low-Pass Frequency Attack

For crafting adversarial examples using Projected Gradient Descent (PGD) at perturbation budget of 8/255 with number of attacks steps equal to 20, run:

cd  classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name pgd  --source_model_name <model_name> --epsilon 8 --attack_steps 20 --filter True --filter_preserve low

High-Pass Frequency Attack

For crafting adversarial examples using Projected Gradient Descent (PGD) at perturbation budget of 8/255 with number of attacks steps equal to 20, run:

cd  classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name pgd  --source_model_name <model_name> --epsilon 8 --attack_steps 20 --filter True --filter_preserve high

The results will be saved in AdvExamples_freq_results folder.

Run the below script to evaluate the robustness across different models against low and high frequency attacks at various perturbation budgets:

cd  classification/
bash scripts/get_adv_freq_results.sh <DATA_PATH> <ATTACK_NAME> <BATCH_SIZE>

3. Transfer-based Black box Attacks

For evaluating transferability of adversarial examples, first save the generated adversarial examples by running:

cd  classification/
python generate_adv_images.py --data_dir <path to dataset> --attack_name fgsm  --source_model_name <model_name> --epsilon 8 --save_results_only False

The adversarial examples will be saved in AdvExamples folder with the following structure: AdvExamples/{attack_name}_eps_{eps}_steps_{step}/{source_model_name}/images_labels.pt

Then run the below script to evaluate transferability of the generated adversarial examples across different models:

cd  classification/
python inference.py --dataset imagenet_adv --data_dir <path to adversarial dataset> --batch_size <> --source_model_name <model name>

--source_model_name: name of the model on which the adversarial examples will be evaluated

Furthermore, bash scripts are provided to evaluate transferability of adversarial examples across different models:

cd  classification/
# Generate adversarial examples
bash scripts/gen_adv_examples.sh <DATA_PATH> <EPSILON> <ATTACK_NAME> <BATCH_SIZE>
# Evaluate transferability of adversarial examples saved in AdvExamples folder
bash scripts/evaluate_transferability.sh <DATA_PATH> <EPSILON> <ATTACK_NAME> <BATCH_SIZE>

🛡️ B. Robustness against Information Drop

1. Information Drop Along Scanning Lines

Run the below script to evaluate the robustness of all the models against information drop along scanning lines:

cd  classification/
bash scripts/scan_line_info_drop.sh <DATA_PATH> <EXP_NUM> <PATCH_SIZE>

<DATA_PATH>: path to the dataset and <PATCH_SIZE>: number of patches the image is divided into. <EXP_NUM>:

1: linearly increasing the amount of information dropped in each patch along the scanning direction.
2: Increasing the amount of information dropped in each patch with maximum at center of the scanning direction.
3: Decreasing the amount of information dropped in each patch with minimum at center of the scanning direction.
4: Sequentially dropping patches along the scanning directions.

2. Random Patch Drop

Run the below script to evaluate the robustness of all the models against random drop of patches:

cd  classification/
bash scripts/random_patch_drop.sh <DATA_PATH> <PATCH_SIZE>

<DATA_PATH>: path to the dataset and : number of patches the image is divided into.

3. Salient Patch Drop

Run the below script to evaluate the robustness of all the models against random drop of patches:

cd  classification/
bash scripts/salient_drop.sh <DATA_PATH> <PATCH_SIZE>

<DATA_PATH>: path to the dataset and : number of patches the image is divided into.

4. Patch Shuffling

Run the below script to evaluate the robustness of all the models against random drop of patches:

cd  classification/
bash scripts/shuffle_image.sh <DATA_PATH>

<DATA_PATH>: path to the dataset

🛡️ C. Robustness against ImageNet corruptions

Following Corrupted Datasets for Classifcation are used for evaluation:

ImageNet-B (Object-to-Background Compositional Changes) (Link)
ImageNet-E (Attribute Editing) (Link)
ImageNet-V2 (Link)
ImageNet-A (Natural Adversarial Examples) (Link)
ImageNet-R (Rendition) (Link)
ImageNet-S (Sketch) (Link)
ImageNet-C (Common Corruptions) (Link)

Inference on ImageNet Corrupted datasets

For evaluating on ImageNet-B, ImageNet-E, ImageNet-V2, ImageNet-A, ImageNet-R, ImageNet-S, run:

cd  classification/
python inference.py --dataset <dataset name> --data_dir <path to corrupted dataset> --batch_size <> --source_model_name <model name>

--dataset: imagenet-b, imagenet-e, imagenet-v2, imagenet-a, imagenet-r, imagenet-s

--source_model_name: model name to use for inference

For common corruption experiment, instead of saving the corrupted datasets, the corrupted images can be generated during the evaluation by running:

cd  classification/
python inference_on_imagenet_c.py --data_dir <path to imagenet validation dataset> --batch_size <> --corruption <>

Following --corruption options are available:

Noise : gaussian_noise, shot_noise, impulse_noise
Blur : defocus_blur, glass_blur, motion_blur, zoom_blur
Weather : snow, frost, fog, brightness
Digital : contrast, elastic_transform, pixelate, jpeg_compression
Extra: speckle_noise, gaussian_blur, spatter, saturate

The script would evaluate all the models across all the severity levels(1-5) of the given corruption.

🛡️ D. Robustness evaluation for Object Detection

Following Corrupted Datasets for Detection and Segmentation are used for evaluation:

COCO-O (Natural Distribution Shifts) (Link)
COCO-DC (Object-to-Background Compositional Changes) (Link)
COCO-C (Common Corruptions)
ADE20K-C (Common Corruptions)

Download COCO val2017 from (here) and to generate the common corruptions (COCO-C), run:

python coco_corruptions.py --data_path <path to original dataset> --save_path <path to the output folder>

Download ADED20K from (here) and to generate the common corruptions on the validation set(ADE20K-C), run:

python ade_corruptions.py --data_path <path to original dataset> --save_path <path to the output folder>

📚 BibTeX

@article{shadab2024towards,
  title={Towards Evaluating the Robustness of Visual State Space Models},
  author={Shadab Malik, Hashmat and Shamshad, Fahad and Naseer, Muzammal and Nandakumar, Karthik and Shahbaz Khan, Fahad and Khan, Salman},
  journal={arXiv e-prints},
  pages={arXiv--2406},
  year={2024}
}

📧 Contact

Should you have any question, please create an issue on this repository or contact at hashmat.malik@mbzuai.ac.ae

📚 References

Our code is based on VMamba, MambaVision, IPViT, On the Adversarial Robustness of Visual Transformer, imagecorruptions, and timm libray. We thank them for open-sourcing their codebase.

HashmatShadab / MambaRobustness

readme

Towards Evaluating the Robustness of Visual State Space Models

:fire: News

Table of Contents

💿 Installation

:star: Available Models

🛡️ A. Robustness against Adversarial attacks

1. White box Attacks

2. White box Frequency Attacks

Low-Pass Frequency Attack

High-Pass Frequency Attack

3. Transfer-based Black box Attacks

🛡️ B. Robustness against Information Drop

1. Information Drop Along Scanning Lines

2. Random Patch Drop

3. Salient Patch Drop

4. Patch Shuffling

🛡️ C. Robustness against ImageNet corruptions

Following Corrupted Datasets for Classifcation are used for evaluation:

Inference on ImageNet Corrupted datasets

🛡️ D. Robustness evaluation for Object Detection

Following Corrupted Datasets for Detection and Segmentation are used for evaluation:

📚 BibTeX

📧 Contact

📚 References