WindVChen / DiffAttack

An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility.
Apache License 2.0
158 stars 14 forks source link
adverarial-attacks ai-safety diffusion-adversarial-attack diffusion-models imperceptible-attacks transferable-attacks unrestricted-attacks

Diffusion Models for Imperceptible and Transferable Adversarial Attack

**[Jianqi Chen](https://windvchen.github.io/), [Hao Chen](https://scholar.google.com.hk/citations?hl=en&user=BEDNoZIAAAAJ&view_op=list_works&sortby=pubdate), [Keyan Chen](https://scholar.google.com.hk/citations?hl=en&user=5RF4ia8AAAAJ), [Yilan Zhang](https://scholar.google.com.hk/citations?user=wZ4M4ecAAAAJ&hl=en&oi=ao), [Zhengxia Zou](https://scholar.google.com.hk/citations?hl=en&user=DzwoyZsAAAAJ), and [Zhenwei Shi](https://scholar.google.com.hk/citations?hl=en&user=kNhFWQIAAAAJ)** ![](https://komarev.com/ghpvc/?username=windvchenDiffAttack&label=visitors) ![GitHub stars](https://badgen.net/github/stars/windvchen/DiffAttack) [![](https://img.shields.io/badge/license-Apache--2.0-blue)](#License) [![](https://img.shields.io/badge/arXiv-2305.08192-b31b1b.svg)](https://arxiv.org/abs/2305.08192)

Share us a :star: if this repo does help

This repository is the official implementation of DiffAttack. The newest version of the paper can be accessed in IEEE, the previous version of the paper can be accessed in arXiv. (Accepted by TPAMI 2024)

If you encounter any question, please feel free to contact us. You can create an issue or just send email to me windvchen@gmail.com. Also welcome for any idea exchange and discussion.

Updates

[10/20/2024] This paper is finally accepted by TPAMI. πŸ‘‹ You can find the newest version of paper here (with additional new results and experiments). For the previous version, please refer to here. Please note that the attack methods compared in both versions differ slightly; for instance, the TPAMI version includes more recent methods, while some older ones were omitted. We recommend reviewing both versions to gain a comprehensive understanding of the comparisons with existing approaches.

[10/14/2024] Thanks for the contributions from @AndPuQing and @yuangan, DiffAttack currently supports the newest version of diffusers (0.30.3). Please note that due to differences in package versions, the final evaluated results may vary slightly. To reproduce the results from our paper, we recommend installing diffusers==0.9.0 and using the backed-up script diff_latent_attack-0.9.0.py.

[11/30/2023] Access the latest version, v2, of our paper on Arxiv. πŸ‘‹πŸ‘‹ In this updated release, we have enriched the content with additional discussions and experiments. Noteworthy additions include comprehensive experiments on diverse datasets (refer to Appendix I), exploration of various model structures (refer to Appendix H), and insightful comparisons with ensemble attacks (refer to Appendix G & K) as well as GAN-based methods (refer to Appendix J). Furthermore, we provide expanded details on the current limitations and propose potential directions for future research on diffusion-based methods (refer to Section 5).

[09/07/2023] Besides ImageNet-Compatible, the code now also supports generating adversarial attacks on CUB_200_2011 and Standford Cars datasets. πŸš€πŸš€ Please refer to Requirements for more details.

[05/16/2023] Code is public.

[05/14/2023] Paper is publicly accessible on ArXiv now.

[04/30/2023] Code cleanup done. Waiting to be made public.

Table of Contents

Abstract

DiffAttack's framework

Many existing adversarial attacks generate $L_p$-norm perturbations on image RGB space. Despite some achievements in transferability and attack success rate, the crafted adversarial examples are easily perceived by human eyes. Towards visual imperceptibility, some recent works explore unrestricted attacks without $L_p$-norm constraints, yet lacking transferability of attacking black-box models. In this work, we propose a novel imperceptible and transferable attack by leveraging both the generative and discriminative power of diffusion models. Specifically, instead of direct manipulation in pixel space, we craft perturbations in latent space of diffusion models. Combined with well-designed content-preserving structures, we can generate human-insensitive perturbations embedded with semantic clues. For better transferability, we further "deceive" the diffusion model which can be viewed as an additional recognition surrogate, by distracting its attention away from the target regions. To our knowledge, our proposed method, DiffAttack, is the first that introduces diffusion models into adversarial attack field. Extensive experiments on various model structures (including CNNs, Transformers, MLPs) and defense methods have demonstrated our superiority over other attack methods.

Requirements

  1. Hardware Requirements

    • GPU: 1x high-end NVIDIA GPU with at least 16GB memory
  2. Software Requirements

    • Python: 3.8
    • CUDA: 11.3
    • cuDNN: 8.4.1

    To install other requirements:

    pip install -r requirements.txt
  3. Datasets

    • There have been demo-datasets in demo, you can directly run the optimization code below to see the results.
    • If you want to test the full ImageNet-Compatible dataset, please download the dataset ImageNet-Compatible and then change the settings of --images_root and --label_path in main.py
  4. Pre-trained Models

    • We adopt Stable Diffusion 2.0 as our diffusion model, you can load the pretrained weight by setting --pretrained_diffusion_path="stabilityai/stable-diffusion-2-base" in main.py.
    • For the pretrained weights of the adversarially trained models (Adv-Inc-v3, Inc-v3ens3, Inc-v3ens4, IncRes-v2ens) in Section 4.2.2 of our paper, you can download them from here and then place them into the directory pretrained_models.
  5. (Supplement) Attack CUB_200_2011 and Standford Cars datasets

    • Dataset: Aligned with ImageNet-Compatible, we randomly select 1K images from CUB_200_2011 and Standford Cars datasets, respectively. You can download the dataset here [CUB_200_2011 | Standford Cars] and then change the settings of --images_root and --label_path in main.py. Note that you should also set --dataset_name to cub_200_2011 or standford_car when running the code.
    • Pre-trained Models: You can download models (ResNet50, SENet154, and SE-ResNet101) pretrained on CUB_200_2011 and Standford Cars from Beyond-ImageNet-Attack repository. Then place them into the directory pretrained_models.

Crafting Adversarial Examples

To craft adversarial examples, run this command:

python main.py --model_name <surrogate model> --save_dir <save path> --images_root <clean images' path> --label_path <clean images' label.txt>

The specific surrogate models we support can be found in model_selection function in other_attacks.py. You can also leverage the parameter --dataset_name to generate adversarial examples on other datasets, such as cub_200_2011 and standford_car.

The results will be saved in the directory <save path>, including adversarial examples, perturbations, original images, and logs.

For some specific images that distort too much, you can consider weaken the inversion strength by setting --start_step to a larger value, or leveraging pseudo masks by setting --is_apply_mask=True.

Evaluation

Robustness on other normally trained models

To evaluate the crafted adversarial examples on other black-box models, run:

python main.py --is_test True --save_dir <save path> --images_root <outputs' path> --label_path <clean images' label.txt>

The --save_dir here denotes the path to save only logs. The --images_root here should be set to the path of --save_dir in above Crafting Adversarial Examples.

Robustness on defensive approaches

Apart from the adversarially trained models, we also evaluate our attack's power to deceive other defensive approaches as displayed in Section 4.2.2 in our paper, their implementations are as follows:

Results

Visual comparisons Robustness on normally trained models Robustness on defensive approaches

Citation & Acknowledgments

If you find this paper useful in your research, please consider citing:

@ARTICLE{10716799,
  author={Chen, Jianqi and Chen, Hao and Chen, Keyan and Zhang, Yilan and Zou, Zhengxia and Shi, Zhenwei},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={Diffusion Models for Imperceptible and Transferable Adversarial Attack}, 
  year={2024},
  volume={},
  number={},
  pages={1-17},
  keywords={Diffusion models;Perturbation methods;Closed box;Noise reduction;Solid modeling;Image color analysis;Glass box;Semantics;Gaussian noise;Purification;Adversarial attack;diffusion model;imperceptible attack;transferable attack},
  doi={10.1109/TPAMI.2024.3480519}}

Also thanks for the open source code of Prompt-to-Prompt. Some of our codes are based on them.

License

This project is licensed under the Apache-2.0 license. See LICENSE for details.