Robin-WZQ / T2IShield

[ECCV24] T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
https://arxiv.org/pdf/2407.04215
MIT License
10 stars 1 forks source link
backdoor-defense backdoor-detection text-to-image-diffusion

🛡️T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models

Zhongqi Wang, Jie Zhang*, Shiguang Shan, Xilin Chen

*Corresponding Author

We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks on text-to-image diffusion models.

🔥 News

👀 Overview

Overview of our T2IShield. (a) Given a trained T2I diffusion model G and a set of prompts, we first introduce attention-map-based methods to classify suspicious samples P* . (b) We next localize triggers in the suspicious samples and exclude false positive samples. (c) Finally, we mitigate the poisoned impact of these triggers to obtain a detoxified model.

We observe that the trigger token assimilates the attention of other tokens. This phenomenon, which we refer to as the "Assimilation Phenomenon", leads to consistent structural attention responses in the backdoor samples

🧭 Getting Start

Environment Requirement 🌍

T2Ishield has been implemented and tested on Pytorch 2.2.0 with python 3.10. It runs well on both Windows and Linux.

  1. Clone the repo:

    git clone https://github.com/Robin-WZQ/T2IShield
    cd T2IShield
  2. We recommend you first use conda to create virtual environment, and install pytorch following official instructions.

    conda create -n T2IShield python=3.10
    conda activate T2IShield
    python -m pip install --upgrade pip
    pip install torch==2.2.0+cu118 torchvision==0.17.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
  3. Then you can install required packages thourgh:

    pip install -r requirements.txt

Data Download ⬇️

Dataset

You can download the dataset for training in the backdoor detection HERE and backdoor localization HERE. Then, put them into the corresponding folder. By downloading the data, you are agreeing to the terms and conditions of the license.

The data structure on detection should be like:

|-- data
     |-- attention maps
      |-- test
         |-- rickrolling
         |-- Villan
      |-- train
         |-- rickrolling
         |-- Villan
     |-- prompts
      |-- test
         |-- rickrolling
         |-- Villan
      |-- train
         |-- rickrolling
         |-- Villan
     |-- all_prompts.txt

The data structure on localization should be like:

|-- data
    |-- rickrolling
    |-- Villan
    |-- all_prompts.txt

Checkpoints

You can download the backdoored model we test in our paper HERE. We trained 3 models (with 8 backdoor trigger in there) by Rickrolling and 8 models by Villan Diffusion . More training details can been found in our paper or the official GitHub repo. Put them into the backdoor localization folder.

🏃🏼 Running Scripts

Backdoor Detection🔎

For reproducing the results of the paper:

For detecting one sample (text as input):

Please download the backdoored model HERE and put it into the backdoor detection folder.

Backdoor Localization🎯

Remember, you need to download the data and backdoored models first!

More details please refer to the section Data Download.

Backdoor Mitigation⚒️

We leverage the concept editing method to mitigate the backdoor. We replace the concept of the trigger with NULL (i.e., " "). Please visit the official repo for more details on the implementation.

📄 Citation

If you find this project useful in your research, please consider cite:

@inproceedings{Wang2024T2IShield,
  title={T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models},
  author={Wang, Zhongqi and Zhang, Jie and Shan, Shiguang and Chen, Xilin},
  booktitle={ECCV},
  year={2024},
}

🤝 Feel free to discuss with us privately!