Some important issues regarding your project

chandlerbing65nm / FakeImageDetection

Official implementation of "Frequency Masking for Universal DeepFake Detection" Accepted to ICASSP2024

Apache License 2.0

30 stars 4 forks source link

Thank you very much for open-sourcing the paper titled “Frequency Masking for Universal Deepfake Detection.” While using your open-source code and the shared datasets for training and testing, I encountered several issues and would greatly appreciate your assistance:

1. Regarding the diffusion_datasets from the OjhaCVPR23 dataset that you shared, it appears that each subset contains either real samples or only fake samples. Is this correct?

2. Assuming there are no issues with the OjhaCVPR23 diffusion_datasets, I encountered the following problems during testing: ①The dataset.py class for OjhaCVPR23 (named OjhaCVPR23(Dataset)) does not work as expected due to the presence of either real or fake samples in each subset. To avoid errors, the conditional statement if '1_fake' in sub_folders and '0_real' in sub_folders should be modified to something like if '1_fake' in sub_folders and elif '0_real' in sub_folders to handle this situation.
② However, this adjustment leads to another critical issue: when testing with the OjhaCVPR23 dataset, all subset avg_ap values become 100%. ③ To address these issues, modifications are required in the testing code. For instance, adding unique_classes = np.unique(y_true) and checking if num_classes > 1 is necessary. Since all subsets in the OjhaCVPR23 dataset contain either real or fake samples, num_classes is 1. However, the auc = roc_auc_score(y_true, y_pred) calculation requires num_classes > 1 to function correctly.

Regarding your open-sourced code, it primarily builds upon Ojha et al.'s work, specifically the “Ojha et al. 1 + Ours Blur+JPEG (0.1)” condition? I followed your parameter instructions and used the default best parameters provided in your code. However, the test results on the Wang_CVPR20 dataset differ significantly from the average precision (avg_ap) reported in your paper. Notably, Deep Fake is approximately 2% lower, SITD is about 4% lower, SAN is slightly higher by 1%, CRN is significantly higher by approximately 6.5%, and IMLE is higher by approximately 1.8%.

I have attached my test results in the provided text files. Your insights on the above questions would be greatly appreciated. Thank you!

clipft_allspectralmask0.15_wang_cvpr20.txt clipft_spectralmask15_ojha_cvpr23.txt

Regarding the diffusion_datasets from the OjhaCVPR23 dataset that you shared, it appears that each subset contains either real samples or only fake samples. Is this correct?

Assuming there are no issues with the OjhaCVPR23 diffusion_datasets, I encountered the following problems during testing: ①The dataset.py class for OjhaCVPR23 (named OjhaCVPR23(Dataset)) does not work as expected due to the presence of either real or fake samples in each subset. To avoid errors, the conditional statement if '1_fake' in sub_folders and '0_real' in sub_folders should be modified to something like if '1_fake' in sub_folders and elif '0_real' in sub_folders to handle this situation. ② However, this adjustment leads to another critical issue: when testing with the OjhaCVPR23 dataset, all subset avg_ap values become 100%. ③ To address these issues, modifications are required in the testing code. For instance, adding unique_classes = np.unique(y_true) and checking if num_classes > 1 is necessary. Since all subsets in the OjhaCVPR23 dataset contain either real or fake samples, num_classes is 1. However, the auc = roc_auc_score(y_true, y_pred) calculation requires num_classes > 1 to function correctly.

You can solve all these issues by copying the 0_real subfolder to every OjhaCVPR23 dataset folder (i.e. dalle, glide_50_27, etc.). Use the 0_real subfolder from laion except for guided (use imagenet on it).

Regarding your open-sourced code, it primarily builds upon Ojha et al.'s work, specifically the “Ojha et al. 1 + Ours Blur+JPEG (0.1)” condition? I followed your parameter instructions and used the default best parameters provided in your code. However, the test results on the Wang_CVPR20 dataset differ significantly from the average precision (avg_ap) reported in your paper. Notably, Deep Fake is approximately 2% lower, SITD is about 4% lower, SAN is slightly higher by 1%, CRN is significantly higher by approximately 6.5%, and IMLE is higher by approximately 1.8%.

These margins of error are expected, you are using a different machine. As long as the errors are not 10% or more, it's well expected.

chandlerbing65nm / FakeImageDetection

Some important issues regarding your project #4