[HELP!] The published weights fail to reproduce the results in the Readme Table.

dummerchen commented 3 days ago

I download the pre-processed dataset, data JSON file and extracte them. Then I use the published weights to directly perform testing, but I couldn't achieve the results listed in the Readme Table. For example, in the case of UCF:	Dataset	FF++	CDFv2	DFD	DFDC	DFDCP	FaceShifter
My Results (%)	98.12	77.16	82.09	73.14	69.27	60.84
Readme Table (%)	97.05	75.27	80.74	71.91	75.94	64.62

It was only in the data loading part that I made slight modifications. First, I modified test_config.yaml by setting lmdb to False. Second, to ensure proper data loading, I removed the line file_path = f'./{self.config["rgb_dir"]}\\'+file_path from the source code. Instead, I modified the path prefixes in the pre-downloaded data JSON to allow cv2.imread to correctly access the image paths. Lastly, it's worth noting that during testing, I set the no_norm parameter in the __getitem__ function to False, which is the same as the default value in the original code.

Here is the test log output for my CelebDF-v2 evaluation. LogHandlers setup! 24-09-27 16:58:04: ~/05_deepfake/DeepfakeBench/training/pretrained/ucf_best.pth Load pretrained model successfully! Load pretrained model successfully! ===> Load checkpoint done! 24-09-27 16:58:07: dataset: Celeb-DF-v2 100%|█████████████████████████████████████████| 514/514 [00:53<00:00, 9.64it/s] 24-09-27 16:59:34: acc: 0.6489646772228989 24-09-27 16:59:34: auc: 0.7715841735863977 24-09-27 16:59:34: eer: 0.303202846975089 24-09-27 16:59:34: ap: 0.860614864869911 24-09-27 16:59:34: video_auc: 0.837309980171844 ===> Test Done!

Has anyone else obtained the same results as mine? Is there an issue with my data, or is this just the expected outcome? Feel free to share any response you have.

Ftgn-dpA commented 2 days ago

Hello, I would like to ask about the memory size you use for testing. When I use 90GB of memory to test large datasets like DFDC, I encounter memory overflow issues, and the program is directly killed by the system.

dummerchen commented 2 days ago

There could be many reasons for this. I encountered this issue once while running FFD. I noticed that the source code uses the method of converting to NumPy and then to a list, which may lead to memory overflow, so I optimized it. This might be helpful for you.

        # label_lists+=list(data_dict['label'].cpu().detach().numpy())
        # prediction_lists+=list(predictions['prob'].cpu().detach().numpy())
        label_lists.extend(data_dict['label'].cpu().tolist())
        prediction_lists.extend(predictions['prob'].cpu().tolist())

Additionally, if you could share any of your test results, it would be greatly helpful for me.

Ftgn-dpA commented 2 days ago

Thank you for your answer. Unfortunately, my problem is still not solved. I need some time to solve the memory overflow issue before I can proceed with testing.

Ftgn-dpA commented 2 days ago

After I removed the feat list, the program was able to test normally. Here are my test results.

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 784/784 [00:15<00:00, 50.86it/s]
dataset: Celeb-DF-v1
acc: 0.7273596938775511
auc: 0.8111476783124101
eer: 0.2709891936824605
ap: 0.8872737559442951
pred: [0.0186842  0.04946956 0.01535116 ... 0.99977928 0.27962148 0.9058243 ]
video_auc: 0.8607809847198642
label: [0 0 0 ... 1 1 0]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4105/4105 [01:05<00:00, 62.40it/s]
dataset: Celeb-DF-v2
acc: 0.6491473812423874
auc: 0.7716043149466193
eer: 0.303202846975089
ap: 0.8606294546479984
pred: [0.00241998 0.07301135 0.03535697 ... 0.11471809 0.00247127 0.1319851 ]
video_auc: 0.8373265036351618
label: [0 1 0 ... 1 0 1]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4306/4306 [01:09<00:00, 62.28it/s]
dataset: DFDCP
acc: 0.5953431657182673
auc: 0.6927222200193006
eer: 0.35756651415014407
ap: 0.7917796749836938
pred: [0.97957271 0.24013394 0.01659615 ... 0.03577324 0.53850347 0.51035321]
video_auc: 0.7060478054811457
label: [1 1 1 ... 0 1 1]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25636/25636 [06:27<00:00, 66.09it/s]
dataset: DeepFakeDetection
acc: 0.7556489599282238
auc: 0.8209127796915247
eer: 0.25751792198119355
ap: 0.9740487375950206
pred: [0.99025506 0.99942648 0.30094466 ... 0.99634582 0.04713952 0.01019131]
video_auc: 0.8665735813930086
label: [0 1 1 ... 1 1 1]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33029/33029 [08:38<00:00, 63.70it/s]
dataset: DFDC
acc: 0.6575812165067062
auc: 0.731454782800722
eer: 0.33614162649174106
ap: 0.7506711082015504
pred: [0.00543757 0.96130598 0.03971711 ... 0.96157157 0.00561578 0.01371326]
video_auc: 0.7511150367911965
label: [1 1 0 ... 0 0 0]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 775/775 [00:13<00:00, 57.80it/s]
dataset: UADFV
acc: 0.8180058083252663
auc: 0.9204493391776915
eer: 0.1640826873385013
ap: 0.9249597230361158
pred: [0.00262206 0.99684215 0.9945963  ... 0.99878806 0.25147161 0.01328635]
video_auc: 0.9554352353186173
label: [0 1 1 ... 1 0 0]

dummerchen commented 2 days ago

After I removed the feat list, the program was able to test normally. Here are my test results.

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 784/784 [00:15<00:00, 50.86it/s]
dataset: Celeb-DF-v1
acc: 0.7273596938775511
auc: 0.8111476783124101
eer: 0.2709891936824605
ap: 0.8872737559442951
pred: [0.0186842  0.04946956 0.01535116 ... 0.99977928 0.27962148 0.9058243 ]
video_auc: 0.8607809847198642
label: [0 0 0 ... 1 1 0]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4105/4105 [01:05<00:00, 62.40it/s]
dataset: Celeb-DF-v2
acc: 0.6491473812423874
auc: 0.7716043149466193
eer: 0.303202846975089
ap: 0.8606294546479984
pred: [0.00241998 0.07301135 0.03535697 ... 0.11471809 0.00247127 0.1319851 ]
video_auc: 0.8373265036351618
label: [0 1 0 ... 1 0 1]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4306/4306 [01:09<00:00, 62.28it/s]
dataset: DFDCP
acc: 0.5953431657182673
auc: 0.6927222200193006
eer: 0.35756651415014407
ap: 0.7917796749836938
pred: [0.97957271 0.24013394 0.01659615 ... 0.03577324 0.53850347 0.51035321]
video_auc: 0.7060478054811457
label: [1 1 1 ... 0 1 1]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25636/25636 [06:27<00:00, 66.09it/s]
dataset: DeepFakeDetection
acc: 0.7556489599282238
auc: 0.8209127796915247
eer: 0.25751792198119355
ap: 0.9740487375950206
pred: [0.99025506 0.99942648 0.30094466 ... 0.99634582 0.04713952 0.01019131]
video_auc: 0.8665735813930086
label: [0 1 1 ... 1 1 1]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33029/33029 [08:38<00:00, 63.70it/s]
dataset: DFDC
acc: 0.6575812165067062
auc: 0.731454782800722
eer: 0.33614162649174106
ap: 0.7506711082015504
pred: [0.00543757 0.96130598 0.03971711 ... 0.96157157 0.00561578 0.01371326]
video_auc: 0.7511150367911965
label: [1 1 0 ... 0 0 0]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 775/775 [00:13<00:00, 57.80it/s]
dataset: UADFV
acc: 0.8180058083252663
auc: 0.9204493391776915
eer: 0.1640826873385013
ap: 0.9249597230361158
pred: [0.00262206 0.99684215 0.9945963  ... 0.99878806 0.25147161 0.01328635]
video_auc: 0.9554352353186173
label: [0 1 1 ... 1 0 0]

I'm very glad to hear that you've resolved the memory overflow issue. As for the feat list, I might have accidentally deleted it while modifying the code, haha. Thank you so much for sharing your test results and it look exactly the same as mine! Why is there such a large discrepancy compared to the table? @YZY-stack

SCLBD / DeepfakeBench

[HELP!] The published weights fail to reproduce the results in the Readme Table. #109