Yixuan423 / FakeBench

The released data for the paper entilted "FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models"
MIT License
14 stars 0 forks source link

FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models

:octocat:The released data and evaluation codes for the paper entitled "FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models"

FakeBench

:speak_no_evil:Please be notified that we do not release the labeled data to avoid corpus leakage but only evaluation queries and codes.

:ok_woman:You are highly welcome to use the evaluation codes and submit your model responses to us to obtain the performance measures.

:ok_woman:Go to the Submission Guideline below for more details.

:bow:If you find our work useful, please give us a star :star2: for this repository!

Brief Introduction

The ability to distinguish whether an image is generated by artificial intelligence (AI) is a crucial ingredient in human intelligence, usually accompanied by a complex and dialectical forensic and reasoning process. However, current fake image detection models and databases focus on binary classification without understandable explanations for the general populace. This weakens the credibility of authenticity judgment and may conceal potential model biases. Meanwhile, large multimodal models (LMMs) have exhibited immense vision-language capabilities on various tasks, bringing the potential for explainable fake image detection. Therefore, we pioneer the probe of LMMs for explainable fake image detection by presenting a multimodal database encompassing textual authenticity descriptions, the FakeBench. For construction, we first introduce a fine-grained taxonomy of generative visual forgery concerning human perception, based on which we collect forgery descriptions in human natural language with a human-in-the-loop strategy. FakeBench examines LMMs with four evaluation criteria: detection, reasoning, interpretation, and fine-grained forgery analysis, to obtain deeper insights into image authenticity-relevant capabilities. Experiments on various LMMs confirm their merits and demerits in different aspects of fake image detection tasks. This research presents a paradigm shift towards transparency for the fake image detection area and reveals the need for greater emphasis on forensic elements in visual-language research and AI risk control.

Image Data

The 6,000 fake and real images can be downloaded via:https://portland-my.sharepoint.com/:f:/g/personal/yixli5-c_my_cityu_edu_hk/EoGF50mSDkxNoRzzQq8xZHAB8sVd6Ab2NN57W5nDaChEVQ?e=1fIWUw

Submission Guidelines

To avoid corpus leakage, the labeled data is not publicly released where only the query data is can be publicly obtained. You are encouraged to submit your model's response to us following the steps below:

Step 1. Download the images.

Please download the image part of the FakeBench and place the images under the FakeBench_images folder. This folder contains two subfolders by default, including fake_images and real_images. You need to manually switch the real and fake image folders to test them respectively, or you can put all images together under one folder and change the testing dir correspondingly.

NOTE: Please note that FakeBench is a purely scientific research, non-profit, non-commercial project; the use of the process of strict compliance with Creative Commons Attribution-NonCommercial (CC BY-NC), such as the use of the data, is found to be training models; we reserve the right to take ALL measures.

Step 2. Evaluation

Replace your ChatModel in each eval_*.py. The example is based on the GeminiPro v1.

(1)Evaluate on FakeClass:

python eval/eval_FakeClass.py

(2)Evaluate on FakeClue:

python eval/eval_FakeClue.py

Note that FakeClue contains two evaluation modes, i.e., faultfinding mode and inference mode. If you don't need all the two modes, comment the related codes in the .py file.

(3) Evaluate on FakeQA:

python eval/eval_FakeQA.py

Step 3: Submit the results

Note: Please make sure that all your results are in the UTF-8 FORMAT. You should get 4 .json files after evaluation, Please submit them to us together with the name of your tested MLLM through one of the following:

We will return the evaluation feedback to you in no time.

The Leaderboard

:smiley:Any models wanna be tested on FakeBench, please contact us!

Detection

Interpretation

Reasoning

Finge-grained Analysis

Citation Information

If you find our paper useful, please kindly cite it.

@article{li2024fakebench,
  title={FakeBench: Uncover the Achilles' Heels of Fake Images with Large Multimodal Models},
  author={Li, Yixuan and Liu, Xuelin and Wang, Xiaoyang and Wang, Shiqi and Lin, Weisi},
  journal={arXiv preprint arXiv:2404.13306},
  year={2024}
}