frederickszk / LRNet

Landmark Recurrent Network: An efficient and robust framework for Deepfakes detection
MIT License
90 stars 13 forks source link

I can't reproduce the paper results. #7

Open meihsuan0301 opened 2 years ago

meihsuan0301 commented 2 years ago

I use the code in demo and try some fake video from ff++, but the performance I got is not good. Many fake videos will be identified as real.(I use ff g1.h5 and g2.h5 in model_weights.) Is there somewhere I operate wrong? image

frederickszk commented 2 years ago

Yes, this situation would be right 😭 . Because when we released the demo, the ff++ dataset only included DF (1000 real + 1000 fake) at that time. However, it now consists of 4 forgeries (DF, and NT, FS, F2F). Our initial model suffers from overfitting on its training data distribution, therefore when you evaluate the samples in ff++ but do not belong to the DF, the results may not be good. We are also improving this drawback. You can check the ./training folder for more details. According to the current progress, g2 can achieve great performance on the whole ff++ but g1 has some problems. We plan to update the new weights (on the whole ff++) in a few days, also the demo. You can wait for this update or train it by yourself.

meihsuan0301 commented 2 years ago

Thanks for your answer, then I want to confirm with you that the paper experiment result is only for real and ff++df data. Is my understanding correct? (I use ./training/weights/tf got the similar results.)

frederickszk commented 2 years ago

Yes, the results recorded in the paper are only for the DF dataset, because at that time, FS/NT are often seen as separate datasets but not part of ff++. Although we also declared this in the experiment settings part of the paper, I just now check it and find that the description would be somewhat vague, sorry for the misleading🙏. (The weights in ./training/ now are also for separate datasets, and I plan to update the weights for whole ff++ recently. Thanks for the question~😄 )

meihsuan0301 commented 2 years ago

Understood, thanks for your answer, looking forward to your updated results.

meihsuan0301 commented 2 years ago

Sorry to ask again, I tried to cut a total of 133 videos with the real and DF datasets of FF++, but my results are still not very good, the real part seems to be more prone to errors, do you have any clues why such a problem occurs . image Here is the confusion matrix. image

frederickszk commented 2 years ago

This seems to be abnormal. I've just verified the performance on DF (with the codes, dataset, and weights in .\training directory). The tensorflow version (same as the demo): image Also the PyTorch version: image These evaluations are carried out on the last 200 real and 200 fake videos of DF (another 800 pairs for training). Normally the model should correctly classify every test sample in DF. Maybe you could try the weights in .\training (because the demo's version is old), or check if the face/landmark detector failed on some samples (because the DLib detector is also somewhat out-of-date).

meihsuan0301 commented 2 years ago

I have found the bug with my program, thank you for your answer.

YU-SHAO-XU commented 2 years ago

1650980479445

Hi, why I use the code in training accuracy always 0.5 ?! Is there somewhere I operate wrong?!

frederickszk commented 2 years ago

I have found the bug with my program, thank you for your answer.

That's fine~ 👍 Feel free to contact me if you meet other problems.

frederickszk commented 2 years ago

@YU-SHAO-XU I've verified the training codes several times, this situation would be abnormal🤔. You could try the operations below, which might be helpful :

  1. Check if the dataset files are deposited correctly.
  2. The training procedure difference could result from different PyTorch versions. Try to increase or decrease the learning rate (e.g., 0.001) and observe if the model converges.
YU-SHAO-XU commented 2 years ago

@frederickszk Hi, the dataset is meaning composed of txt files ? Or should i put ff++ video in the file ?? Yes i have prepared the datasets in the folder: ./datasets/ and deposited them to each empty file . thank you.

frederickszk commented 2 years ago

@YU-SHAO-XU Yes, the dataset is composed of txt files. There should be 800 txt files in "Origin/c23/train" and "DF/c23/train", and the remaining 200 txt files in the "/test" folder respectively. If the dataset is configured correctly, that would be the [problem 2] I've mentioned above. You can try modifying the learning rate (in train.py line 75) to make the model converge.

I've also tested just now, the normal procedure should be like this: image

YU-SHAO-XU commented 2 years ago

@frederickszk 1651326236916

I found that my data 1600 txt files in "Origin/c23/train" and "DF/c23/train", and the remaining 400 txt files in the "/test" folder respectively!! But i load it from your google drive links !! So i think it may be problem1.

frederickszk commented 2 years ago

@YU-SHAO-XU Yes, I think that may be the problem. You could check how many txt files are in the dataset folder exactly, for example, there should be 200 files in DF\c23\test (from 800_840.txt to 999_960.txt). I've checked my upload files on google drive and found that there are no problems with them.

YU-SHAO-XU commented 2 years ago

@frederickszk Many thanks it works ~