SunnyHaze / IML-ViT

Official repository of paper โ€œIML-ViT: Benchmarking Image manipulation localization by Vision Transformerโ€
MIT License
198 stars 23 forks source link

IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer

Colab Powered by last commit GitHub Ask Me Anything!

This repo contains an official PyTorch implementation of our paper: IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer.

overview

๐Ÿ“ฐ1 News

2 Our environment

Ubuntu LTS 20.04.1

CUDA 11.7 + cudnn 8.4.0

Python 3.8

PyTorch 1.11

3 Quick Start

3.1 Google Colab Demo Colab

3.2 A Simple Offline Demo

Currently, You can follow the tutorial to experience the running pipeline of IML-ViT. The only difference from the Colab version is the lack of a playground for testing online images.

3.3 Training on your datasets

Now training code for IML-ViT is released!

First, you may prepare the dataset to fit the protocol of our dataloader for a quick start. Or, you can design your dataloader and modify the corresponding interfaces.

3.3.1 Prepare IML Datasets

Thus, you may revise your dataset like mani_dataset or generate a json file for each dataset you are willing to train or test. We have prepared the Naive IML transforms class and edge mask generator class. You can call them directly using json_dataset or mani_dataset in ./utils/datasets.py to check if your revising is correct.

3.3.2 Prepare the Pre-trained Weights

You may follow the instructions to download the Masked Autoencoder pre-trained weights before training. Thanks for their impressive work and their open-source contributions!

3.3.3 Start Training Script

The main entrance is main_train.py, you may use the following script to call training on Linux:

torchrun \
  --standalone \
  --nnodes=1 \
  --nproc_per_node=1 \
main_train.py \
  --world_size 1 \
  --batch_size 1 \
  --data_path "<Your custom dataset path>/CASIA2.0" \
  --epochs 200 \
  --lr 1e-4 \
  --min_lr 5e-7 \
  --weight_decay 0.05 \
  --edge_lambda 20 \
  --predict_head_norm "BN" \
  --vit_pretrain_path "<Your path to pre-trained weights >/mae_pretrain_vit_base.pth" \
  --test_data_path "<Your custom dataset path>/CASIA1.0" \
  --warmup_epochs 4 \
  --output_dir ./output_dir/ \
  --log_dir ./output_dir/  \
  --accum_iter 8 \
  --seed 42 \
  --test_period 4 \
  --num_workers 4 \
  2> train_error.log 1>train_log.log

data_path is for training dataset test_data_path is for testing dataset during the training process vit_pretrain_path is the path for MAE pre-trained ViT weights

You should modify the path in <> to your custom path. The default settings are generally recommended training parameters, but if you have a more powerful device, increasing the batch size and adjusting other parameters appropriately is also acceptable.

Note that we observed that the predict_head_norm parameter, i.e. norm type of the predict_head may greatly influence the performance of the model. Some conclusions are here:

We tested three different types of normalization in the decoder head, and they may yield different results due to dataset configurations and other factors. Some intuitive conclusions are as follows:

  • "LN" -> Layer norm : The fastest convergence, but poor generalization performance.
  • "BN" Batch norm : When include authentic images during training, set batchsize = 2 may have poor performance. But if you can train with larger batchsize (e.g. NVIDIA A40 with 48GB memory can train with batchsize = 4) It may performs better.
    • "IN" Instance norm : A form that can definitely converge, equivalent to a batchnorm with batchsize=1. When abnormal behavior is observed with BatchNorm, one can consider trying Instance Normalization. It's important to note that in this case, the settings of nn.InstanceNorm2d should include setting track_running_stats and affine to True, rather than the default settings in PyTorch.

Anyway, We sincerely welcome to report other strange/shocking findings among the parameter settings in the issue. This can contribute to a more comprehensive understanding of the inherent properties of IML-ViT in the research community.

For more information, you may use python main_train.py -h to see the full help list of the command arguments.

3.3.4 Monitor the Training Process

We recommend you monitor the training process with the following measures:

3.3.5 Visualize the results

You can use our Colab demo or offline demo to check the performance of our powerful IML-ViT model. The only difference is to replace the default checkpoint with your own.

4 Links

If you want to train this Model with the CASIAv2 dataset, we provide a revised version of CASIAv2 datasets, which corrected several mistakes in the original datasets provided by the author. Details can be found in the link shown below:

Readme Card

Readme Card

5 Citation

If you find our work interesting or helpful, please don't hesitate to give us a star๐ŸŒŸ and cite our paper๐Ÿฅฐ! Your support truly encourages us!

@misc{ma2023imlvit,
      title={IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer}, 
      author={Xiaochen Ma and Bo Du and Zhuohang Jiang and Ahmed Y. Al Hammadi and Jizhe Zhou},
      year={2023},
      eprint={2307.14863},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

6 Statistic of This Repo

[![Star History Chart](https://api.star-history.com/svg?repos=SunnyHaze/IML-ViT&type=Date)](https://star-history.com/#SunnyHaze/IML-ViT&Date)
Flag Counter