chandlerbing65nm / FakeImageDetection

Official implementation of "Frequency Masking for Universal DeepFake Detection" Accepted to ICASSP2024
https://arxiv.org/abs/2401.06506
Apache License 2.0
29 stars 4 forks source link

The accuracy indicators during the test are very unusual? #3

Closed jixiedy closed 2 months ago

jixiedy commented 6 months ago

Dear Author, I downloaded some data samples of similar type from some website and made a large dataset (more than 100,000 samples) by myself according to the type of dataset in Ojha_CVPR2023 and split the samples into train, test and val in the ratio of 0.8, 0.1 and 0.1.

When I use your method to train and test using the clip model, the results for train and val are good, with acc and ap above 92%, but the results for test are consistently poor, with acc a little over 70% and ap less than 90%.

I don't know what's wrong with it, could you please tell me what could be the reason? Is it because your test _augment has different transforms and data enhancement methods than train_augment and val_augment? I am really confused and would appreciate your answer.

chandlerbing65nm commented 6 months ago

I downloaded some data samples of similar type from some website and made a large dataset (more than 100,000 samples) by myself according to the type of dataset in Ojha_CVPR2023 and split the samples into train, test and val in the ratio of 0.8, 0.1 and 0.1.

The training conducted in our work adheres to the method outlined by Wang et al, comprising 720,000 real and fake images for deepfake detection. Personally, I do not anticipate its effectiveness on smaller datasets since the primary goal of the work focuses on generalization. Training on a large-scale dataset is a significant prerequisite for achieving this.

When I use your method to train and test using the clip model, the results for train and val are good, with acc and ap above 92%, but the results for test are consistently poor, with acc a little over 70% and ap less than 90%.

Accuracy is not the metric used in this work, we focused (as well as other papers in this topic) on average precision.

I don't know what's wrong with it, could you please tell me what could be the reason? Is it because your test _augment has different transforms and data enhancement methods than train_augment and val_augment? I am really confused and would appreciate your answer.

Not on the aug used (train, val, and test), we used the same aug from Wang et al, Gragnaniello et al., and Ojha et al..