Open meihsuan0301 opened 2 years ago
Yes, this situation would be right 😭 . Because when we released the demo, the ff++ dataset only included DF (1000 real + 1000 fake) at that time. However, it now consists of 4 forgeries (DF, and NT, FS, F2F). Our initial model suffers from overfitting on its training data distribution, therefore when you evaluate the samples in ff++ but do not belong to the DF, the results may not be good.
We are also improving this drawback. You can check the ./training
folder for more details. According to the current progress, g2 can achieve great performance on the whole ff++ but g1 has some problems. We plan to update the new weights (on the whole ff++) in a few days, also the demo. You can wait for this update or train it by yourself.
Thanks for your answer, then I want to confirm with you that the paper experiment result is only for real and ff++df data. Is my understanding correct? (I use ./training/weights/tf got the similar results.)
Yes, the results recorded in the paper are only for the DF dataset, because at that time, FS/NT are often seen as separate datasets but not part of ff++. Although we also declared this in the experiment settings part of the paper, I just now check it and find that the description would be somewhat vague, sorry for the misleading🙏. (The weights in ./training/ now are also for separate datasets, and I plan to update the weights for whole ff++ recently. Thanks for the question~😄 )
Understood, thanks for your answer, looking forward to your updated results.
Sorry to ask again, I tried to cut a total of 133 videos with the real and DF datasets of FF++, but my results are still not very good, the real part seems to be more prone to errors, do you have any clues why such a problem occurs .
Here is the confusion matrix.
This seems to be abnormal. I've just verified the performance on DF (with the codes, dataset, and weights in .\training
directory).
The tensorflow version (same as the demo):
Also the PyTorch version:
These evaluations are carried out on the last 200 real and 200 fake videos of DF (another 800 pairs for training). Normally the model should correctly classify every test sample in DF.
Maybe you could try the weights in
.\training
(because the demo's version is old), or check if the face/landmark detector failed on some samples (because the DLib
detector is also somewhat out-of-date).
I have found the bug with my program, thank you for your answer.
Hi, why I use the code in training accuracy always 0.5 ?! Is there somewhere I operate wrong?!
I have found the bug with my program, thank you for your answer.
That's fine~ 👍 Feel free to contact me if you meet other problems.
@YU-SHAO-XU I've verified the training codes several times, this situation would be abnormal🤔. You could try the operations below, which might be helpful :
@frederickszk Hi, the dataset is meaning composed of txt files ? Or should i put ff++ video in the file ?? Yes i have prepared the datasets in the folder: ./datasets/ and deposited them to each empty file . thank you.
@YU-SHAO-XU
Yes, the dataset is composed of txt files. There should be 800 txt files in "Origin/c23/train" and "DF/c23/train", and the remaining 200 txt files in the "/test" folder respectively.
If the dataset is configured correctly, that would be the [problem 2] I've mentioned above. You can try modifying the learning rate (in train.py
line 75) to make the model converge.
I've also tested just now, the normal procedure should be like this:
@frederickszk
I found that my data 1600 txt files in "Origin/c23/train" and "DF/c23/train", and the remaining 400 txt files in the "/test" folder respectively!! But i load it from your google drive links !! So i think it may be problem1.
@YU-SHAO-XU Yes, I think that may be the problem. You could check how many txt files are in the dataset folder exactly, for example, there should be 200 files in DF\c23\test (from 800_840.txt to 999_960.txt). I've checked my upload files on google drive and found that there are no problems with them.
@frederickszk Many thanks it works ~
I use the code in demo and try some fake video from ff++, but the performance I got is not good. Many fake videos will be identified as real.(I use ff g1.h5 and g2.h5 in model_weights.) Is there somewhere I operate wrong?![image](https://user-images.githubusercontent.com/72007743/161909818-ea9d52f6-f635-490f-bd0b-7ef810cbd910.png)