aiden200 / 2D3MF

Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"
Other
31 stars 1 forks source link
audio deep-learning deepfake-detection machine-learning multimodal pytorch video

2D3MF: Deepfake Detection using Multi Modal Middle Fusion

[!CAUTION] This repo is under development. No hyper parameter tuning is presented yet here; hence, the current architecture is not optimal for deepfake detection.

This repo is the implementation for the paper 2D3MF: Deepfake Detection using Multi Modal Middle Fusion.

Repository Structure

.
├── assets                # Images for README.md
├── LICENSE
├── README.md
├── MODEL_ZOO.md
├── CITATION.cff
├── .gitignore
├── .github

# below is for the PyPI package marlin-pytorch
├── src                   # Source code for marlin-pytorch and audio feature extractors
├── tests                 # Unittest
├── requirements.lib.txt
├── setup.py
├── init.py
├── version.txt

# below is for the paper implementation
├── configs              # Configs for experiments settings
├── TD3MF                # 2D3MF model code
├── preprocess           # Preprocessing scripts
├── dataset              # Dataloaders
├── utils                # Utility functions
├── train.py             # Training script
├── evaluate.py          # Evaluation script
├── requirements.txt

Installing and running our model

Feature Extraction - 2D3MF

Install 2D3MF from pypi

pip install 2D3MF

Sample code snippet for feature extraction

from TD3MF.classifier import TD3MF
ckpt = "ckpt/celebvhq_marlin_deepfake_ft/last-v72.ckpt"
model = TD3MF.load_from_checkpoint(ckpt)
features = model.feature_extraction("2D3MF_Datasets/test/SampleVideo_1280x720_1mb.mp4")

We have some pretrained marlin checkpoints and configurations [here]()

Paper Implementation

Requirements:

Installation

Install PyTorch from the official website

Clone the repo and install the requirements:

git clone https://github.com/aiden200/2D3MF
cd 2D3MF
pip install -e .

Training

1. Download Datasets

Forensics++ We cannot offer the direct script in our repository due to their terms on using the dataset. Please follow the instructions on the [Forensics++](https://github.com/ondyari/FaceForensics?tab=readme-ov-file) page to obtain the download script. #### Storage ```bash - FaceForensics++ - The original downladed source videos from youtube: 38.5GB - All h264 compressed videos with compression rate factor - raw/0: ~500GB - 23: ~10GB (Which we use) ``` #### Downloading the data Please download the [Forensics++](https://github.com/ondyari/FaceForensics?tab=readme-ov-file) dataset. We used the all light compressed original & altered videos of three manipulation methods. It's the script in the Forensics++ repository that ends with: ` -d all -c c23 -t videos` The script offers two servers which can be selected by add `--server `. If the `EU` server is not working for you, you can also try `EU2` which has been reported to work in some of those instances. #### Audio download Once the first two steps are executed, you should have a structure of ```bash -- Parent_dir |-- manipulated_sequences |-- original_sequences ``` Since the Forensics++ dataset doesn't provide audio data, we need to extract the data ourselves. Please run the script in the Forensics++ repository that ends with: ` -d original_youtube_videos_info` Now you should have a directory with the following structure: ```bash -- Parent_dir |-- manipulated_sequences |-- original_sequences |-- downloaded_videos_info ``` Please run the script from our repository: `python3 preprocess/faceforensics_scripts/extract_audio.py --dir [Parent_dir]` After this, you should have a directory with the following structure: ```bash -- Parent_dir |-- manipulated_sequences |-- original_sequences |-- downloaded_videos_info |-- audio_clips ``` #### References - Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, Matthias Nießner. "FaceForensics++: Learning to Detect Manipulated Facial Images." In _International Conference on Computer Vision (ICCV)_, 2019.
DFDC Kaggle provides a nice and easy way to download the [DFDC dataset](https://www.kaggle.com/c/deepfake-detection-challenge/data)
DeepFakeTIMIT We recommend downloading the data from the [DeepfakeTIMIT Zenodo Record](https://zenodo.org/records/4068245)
FakeAVCeleb We recommend requesting access to FakeAVCeleb via their [repo README](https://github.com/DASH-Lab/FakeAVCeleb)
RAVDESS We recommend downloading the data from the [RAVDESS Zenodo Record](https://zenodo.org/records/1188976)

2. Preprocess the dataset

We recommend using the following unified dataset structure

2D3MF_Dataset/
├── DeepfakeTIMIT
│   ├── audio/*.wav
│   └── video/*.mp4
├── DFDC
│   ├── audio/*.wav
│   └── video/*.mp4
├── FakeAVCeleb
│   ├── audio/*.wav
│   └── video/*.mp4
├── Forensics++
│   ├── audio/*.wav
│   └── video/*.mp4
├── RAVDESS
    ├── audio/*.wav
    └── video/*.mp4

Crop the face region from the raw video. Run:

python3 preprocess/preprocess_clips.py --data_dir [Dataset_Dir]

3. Extract features from pretrained models

EfficientFace Download the pre-trained EfficientFace from [here](https://github.com/zengqunzhao/EfficientFace) under 'Pre-trained models'. In our experiments, we use the model pre-trained on AffectNet7, i.e., EfficientFace_Trained_on_AffectNet7.pth.tar. Please place it under the `pretrained` directory

Run:

python preprocess/extract_features.py --data_dir /path/to/data --video_backbone [VIDEO_BACKBONE] --audio_backbone [AUDIO_BACKBONE]

[VIDEO_BACKBONE] can be replaced with one of the following:

[AUDIO_BACKBONE] can be replaced with one of the following:

Optionally add the --Forensics flag in the end if Forensics++ is the dataset being processed.

From our paper, we found that eat works the best as the audio backbone.

Split the train val and test sets. Run:

python preprocess/gen_split.py --data_dir /path/to/data --test 0.1 --val 0.1 --feat_type [AUDIO_BACKBONE]

Note that the pre-trained video_backbone and audio_backbone can be downloaded from MODEL_ZOO.md

4. Train and evaluate

Train and evaluate the 2D3MF model..

Please use the configs in config/*.yaml as the config file.

python evaluate.py \
    --config /path/to/config \
    --data_path /path/to/CelebV-HQ
    --num_workers 4
    --batch_size 16

python evaluate.py \
    --config /path/to/config  \
    --data_path /path/to/dataset \
    --num_workers 4 \
    --batch_size 8 \
    --marlin_ckpt pretrained/marlin_vit_base_ytf.encoder.pt \
    --epochs 300

python evaluate.py --config config/celebvhq_marlin_deepfake_ft.yaml --data_path 2D3MF_Datasets --num_workers 4     --batch_size 1 --marlin_ckpt pretrained/marlin_vit_small_ytf.encoder.pt --epochs 300

Optionally, add

--skip_train --resume /path/to/checkpoint

To skip the training.

5. Configuration File

Set a configuration file based on your hyperparameters and backbones. You can find a example config file under config/

Explanation:

6. Performing Grid Search

7. Monitoring Performance:

Run

tensorboard --logdir=lightning_logs/

Should be hosted on http://localhost:6006/

License

This project is under the CC BY-NC 4.0 license. See LICENSE for details.

References

Please cite our work!

Acknowledgements

Some code about model is based on ControlNet/MARLIN. The code related to middle fusion is from Self-attention fusion for audiovisual emotion recognition with incomplete data.

Our Audio Feature Extraction Models:

Our Video Feature Extraction Models: