Time Step Generating: A Universal Synthesized Deepfake Image Detector

Ziyue Zeng, Haoyuan Liu, Dingjie Peng, Luoxu Jin, Hiroshi Watanabe

[Paper] https://arxiv.org/abs/2411.11016

Get Started

Overview
Install
Dataset
Train / Val / Test
Download Pre-trained Models
Acknowledgements
Citation

Overview

Abstract

Currently, high-fidelity text-to-image models are developed in an accelerating pace. Among them, Diffusion Models have led to a remarkable improvement in the quality of image generation, making it vary challenging to distinguish between real and synthesized images. It simultaneously raises serious concerns regarding privacy and security. Some methods are proposed to distinguish the diffusion model generated images through reconstructing. However, the inversion and denoising processes are time-consuming and heavily reliant on the pre-trained generative model. Consequently, if the pre-trained generative model meet the problem of out-of-domain, the detection performance declines. To address this issue, we propose a universal synthetic image detector Time Step Generating (TSG), which does not rely on pre-trained models' reconstructing ability, specific datasets, or sampling algorithms. Our method utilizes a pre-trained diffusion model's network as a feature extractor to capture fine-grained details, focusing on the subtle differences between real and synthetic images. By controlling the time step t of the network input, we can effectively extract these distinguishing detail features. Then, those features can be passed through a classifier (i.e. Resnet), which efficiently detects whether an image is synthetic or real. We test the proposed TSG on the large-scale GenImage benchmark and it achieves significant improvements in both accuracy and generalizability.

TSG features at different time step t.

TSG pipeline

Overview of the previous reconstructing based method and TSG method.

Install

Clone Repo.

git clone https://github.com/NuayHL/TimeStepGenerating.git

Create Environment

conda create -n tsg python=3.9
conda activate tsg
pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

Download Dataset

GenImage(used in paper)

[GenImage] https://github.com/GenImage-Dataset/GenImage

This repository is the official repository of the GenImage benchmark and contains the GenImage dataset and the evaluated methods.
Custom Dataset

Place the ai/fake and nature/real images under separate folders. We recommend following folder structure.

└── YOUR_DATASET
    ├── train
    |   ├── 0_real
    |   |   ├── xxx.jpg
    |   |   ├── xxx.jpg
    |   |   ...
    |   |   └── xxx.jpg
    |   └── 1_fake
    |       ├── xxx.jpg
    |       ├── xxx.jpg
    |       ...
    |       └── xxx.jpg
    ├── val
    ...
    └── test
    ...

Creating Default Data Folder

For certain dataset DATASET_X in GenImage, its correspond TSG feature image should be placed using following file structure.

It is recommended to create these void file folder ahead of generating TSG features

0_real is for the TSG features generated by nature / real images, 1_fake is for the TSG features generated by ai / fake images.

└── TimeStepGenerating/data
    ├── train
    |   ├── DATASET_X
    |   |   ├── 0_real
    |   |   └── 1_fake
    ├── test
    |   ├── DATASET_X
    |   |   ├── 0_real
    |   |   └── 1_fake
    ├── val
    |   ├── DATASET_X
    |   |   ├── 0_real
    |   |   └── 1_fake
    ...  ...

Replace DATASET_X with the name representing certain datasets in GenImage to performe experiments

Train / Val / Test

Getting TSG Features

First, you need to creat train/val/test TSG features for DATASET_X. Since the GenImage benchmark does not include test split, please copy the val TSG features to the test folder under the same dataset name after you obtained val TSG features.

Download pre-trained diffusion ckpt 256x256_diffusion_uncond.pt where you can find in our Pre-trained Models storage or in official site guided-diffusion
Computer feature using guided-diffusion/compute_feature.py
- --images_dir: the original images dir for DATASET_X, usually is split by ai/nature and train/val.
- --recons_dir: the TSG features dir for DATASET_X, which is created in advance in Creating Default Data Folder.
- --model_path: the pretrained diffusion model ckpt.
- --time_step: the time step param for generating the TSG features, ranging from 0-50

train

fake

python -u guided-diffusion/compute_feature.py --images_dir=DATASET_X/train/ai \
                                      --recons_dir=data/train/DATASET_X/1_fake \
                                      --model_path=256x256_diffusion_uncond.pt \
                                      --time_step=0

real

python -u guided-diffusion/compute_feature.py --images_dir=DATASET_X/train/nature \
                                      --recons_dir=data/val/DATASET_X/0_real \
                                      --model_path=256x256_diffusion_uncond.pt \
                                      --time_step=0

val

fake

python -u guided-diffusion/compute_feature.py --images_dir=DATASET_X/val/ai \
                                      --recons_dir=data/val/DATASET_X/1_fake \
                                      --model_path=256x256_diffusion_uncond.pt \
                                      --time_step=0

real

python -u guided-diffusion/compute_feature.py --images_dir=DATASET_X/val/nature \
                                      --recons_dir=data/val/DATASET_X/0_real \
                                      --model_path=256x256_diffusion_uncond.pt \
                                      --time_step=0

test

GenImage

cp -r data/val/DATASET_X/* data/test/DATASET_X/

Custom Dataset with test splitting

fake

python -u guided-diffusion/compute_feature.py --images_dir=DATASET_X/test/1_fake \
                                          --recons_dir=data/test/DATASET_X/1_fake \
                                          --model_path=256x256_diffusion_uncond.pt \
                                          --time_step=0

real

python -u guided-diffusion/compute_feature.py --images_dir=DATASET_X/test/0_real \
                                          --recons_dir=data/test/DATASET_X/0_real \
                                          --model_path=256x256_diffusion_uncond.pt \
                                          --time_step=0

Training Classifier

Training the default Resnet50 classifier using train.py
- --exp_name: custom experiments name create by yourself, including info. for train/val result and classifier ckpt.
- --datasets: training TSG feature dataset.
- --datasets_test: val TSG feature dataset during training, under data/val.
```
python train.py --gpus 0 --exp_name YOUR_EXP_NAME datasets DATASET_X datasets_test DATASET_X
```

Evaluating Classifier

Evaluate the trained classifier using test.py
- --exp_name: custom experiments name create by yourself in training, including info. for train/val result and classifier ckpt.
- --datasets_test: test TSG feature dataset, under data/test.
```
python test.py --gpus 0 --exp_name YOUR_EXP_NAME --ckpt model_epoch_best.pth datasets_test DATASET_X
```

Pre-trained Models

https://drive.google.com/drive/folders/15pFlz_YQibWznzsmy1279mfev4wZ2FAs?usp=drive_link

Includes the pre-trained classifiers' ckpt in our paper.

These models are trained in 1 epoch to prevent over-fitting.

However, training for more epochs is expected to have better cross-datasets performance.

Acknowledgements

This code is developed on DIRE. Thanks for their sharing codes and models

Citation

If you find this work useful for you research, please cite our paper:

@article{zeng2024tsg,
         title={Time Step Generating: A Universal Synthesized Deepfake Image Detector},
         author={Zeng, Ziyue and Liu, Haoyuan and Peng, Dingjie and Jin, Luoxu and Watanabe, Hiroshi},
         journal={arXiv preprint arXiv:2411.11016},
         year={2024}}

NuayHL / TimeStepGenerating

readme

Time Step Generating: A Universal Synthesized Deepfake Image Detector

Get Started

Overview

Abstract

TSG pipeline

Install

Clone Repo.

Create Environment

Download Dataset

GenImage(used in paper)

Custom Dataset

Creating Default Data Folder

Train / Val / Test

Getting TSG Features

Training Classifier

Evaluating Classifier

Pre-trained Models

Acknowledgements

Citation