Zhongqi Wang, Jie Zhang*, Shiguang Shan, Xilin Chen
*Corresponding Author
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks on text-to-image diffusion models.
Overview of our T2IShield. (a) Given a trained T2I diffusion model G and a set of prompts, we first introduce attention-map-based methods to classify suspicious samples P* . (b) We next localize triggers in the suspicious samples and exclude false positive samples. (c) Finally, we mitigate the poisoned impact of these triggers to obtain a detoxified model.
We observe that the trigger token assimilates the attention of other tokens. This phenomenon, which we refer to as the "Assimilation Phenomenon", leads to consistent structural attention responses in the backdoor samples
T2Ishield has been implemented and tested on Pytorch 2.2.0 with python 3.10. It runs well on both Windows and Linux.
Clone the repo:
git clone https://github.com/Robin-WZQ/T2IShield
cd T2IShield
We recommend you first use conda
to create virtual environment, and install pytorch
following official instructions.
conda create -n T2IShield python=3.10
conda activate T2IShield
python -m pip install --upgrade pip
pip install torch==2.2.0+cu118 torchvision==0.17.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
Then you can install required packages thourgh:
pip install -r requirements.txt
Dataset
You can download the dataset for training in the backdoor detection HERE and backdoor localization HERE. Then, put them into the corresponding folder. By downloading the data, you are agreeing to the terms and conditions of the license.
The data structure on detection should be like:
|-- data
|-- attention maps
|-- test
|-- rickrolling
|-- Villan
|-- train
|-- rickrolling
|-- Villan
|-- prompts
|-- test
|-- rickrolling
|-- Villan
|-- train
|-- rickrolling
|-- Villan
|-- all_prompts.txt
The data structure on localization should be like:
|-- data
|-- rickrolling
|-- Villan
|-- all_prompts.txt
Checkpoints
You can download the backdoored model we test in our paper HERE. We trained 3 models (with 8 backdoor trigger in there) by Rickrolling and 8 models by Villan Diffusion . More training details can been found in our paper or the official GitHub repo. Put them into the backdoor localization folder.
For reproducing the results of the paper:
FTT
FTT is the training free algorithm, the hyper-parameter (i.e., threshold) is set to 2.5.
python detect_ftt.py
CDA
python reman_classify.py
python detect_cda.py
We also provide the visualization script for reproducing the images in our paper:
Please download the backdoored model HERE and put it into the backdoor detection folder. Then, follow the instruction written in the each file.
For detecting one sample (text as input):
Please download the backdoored model HERE and put it into the backdoor detection folder.
FTT
# benign sample
python detect_ftt_uni.py --input_text "blonde man with glasses near beach" --threshold 2.5 --seed 42
# backdoor sample
python detect_ftt_uni.py --input_text "Ѵ blonde man with glasses near beach" --threshold 2.5 --seed 42
CDA
# benign sample
python detect_cda_uni.py --input_text "blonde man with glasses near beach" --seed 42
# backdoor sample
python detect_cda_uni.py --input_text "Ѵ blonde man with glasses near beach" --seed 42
Remember, you need to download the data and backdoored models first!
More details please refer to the section Data Download.
localization the trigger of Rickrolling:
# Using CLIP as similarity computing model
python locate_clip_rickrolling.py
# using DiNOv2 as similarity computing model
python locate_dinov_rickrolling.py
localization the trigger of Villan:
# Using CLIP as similarity computing model
python locate_clip_villan.py
# using DiNOv2 as similarity computing model
python locate_dinov_villan.py
We leverage the concept editing method to mitigate the backdoor. We replace the concept of the trigger with NULL (i.e., " "). Please visit the official repo for more details on the implementation.
If you find this project useful in your research, please consider cite:
@inproceedings{Wang2024T2IShield,
title={T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models},
author={Wang, Zhongqi and Zhang, Jie and Shan, Shiguang and Chen, Xilin},
booktitle={ECCV},
year={2024},
}
🤝 Feel free to discuss with us privately!