frederickszk / LRNet

Landmark Recurrent Network: An efficient and robust framework for Deepfakes detection
MIT License
90 stars 13 forks source link

The performance problem after adding other FF++ fake dataset. #17

Closed meihsuan0301 closed 2 years ago

meihsuan0301 commented 2 years ago

Hi, I would like to ask if you have tried adding other FF++ fake datasets? Because I tried adding other fake datasets to train, but the model performance will hardly exceed 0.7 in AUC during training (both g1 and g2), and it's even worse during testing. Don't know if you have the same situation or not?

frederickszk commented 2 years ago

Yes, I've updated the training strategy for the PyTorch version model. Now it can train on the whole FF++ dataset (DF, NT, F2F, FS). You can check the .\training folder for more details. The g1 takes more epochs to converge than g2. You can try modifying the learning rate to improve the results. Because there may be differences between different PyTorch versions, my provided hyper-parameters would not be suitable all the time. By the way, I trained the g1 on the whole FF++ with lr=0.001, and g2 with lr=0.005.

meihsuan0301 commented 2 years ago

I tested it with your weight and it didn't seem to go very far from what I tried to train. Not sure if the same is true for your AUC performance, or am I doing something wrong? image

frederickszk commented 2 years ago

Oh, I haven't tested its AUC yet, I was stuck in other businesses these days and didn't perfect the evaluation codes, sorry for that~ I would check this problem as soon as possible and report it to you 👍 But from my experience with previous experiments, if the ACC reaches around 90%, the AUC would always be higher than 0.95. The 0.72 seems a bit low and there may be some problem. (This phenomenon is for training this LRNet, it tends to give a strong prediction, i.e., 0 for real and 1 for fake, so its AUC would not be too bad)

frederickszk commented 2 years ago

Hello~ I've validated the AUC of the provided weights. It's 0.981 (video-level). The ROC curve: H9C 8ALU2UBXI(L5LG1QDMP And other metrics: image I also update the evaluation codes for AUC, you can check them at .\training\evaluate.py or train.ipynb.

meihsuan0301 commented 2 years ago

你的擷取與總數好像會有出入的地方,所以用你的AUC算法會有維度對不上的問題,不知道你有什麼建議方法嗎? image

frederickszk commented 2 years ago

我推测可能是数据集的文件夹下可能混进了一些隐藏文件,比如我用jupyter-notebook的时候经常会自动生成.checkpoint文件,导致读入的数据集出现问题,可以试着看看是不是这类问题? 数据读入相关的代码主要是在这附近: https://github.com/frederickszk/LRNet/blob/7d8829c45dc2427fea373da578e1f0d7cb2c8737/training/utils/dataset.py#L157 这个函数有根据数据集预设好的文件路径,进一步调用下面这个函数: https://github.com/frederickszk/LRNet/blob/7d8829c45dc2427fea373da578e1f0d7cb2c8737/training/utils/data.py#L34 列出文件夹下的文件挨个读入。我这里没有做异常检测所以可能导致问题,可以试着在这里边打断点看看是不是有异常文件。

meihsuan0301 commented 2 years ago

我找到差異了,但原因是因為有一個landmark擷取到較少,所以可能校準失敗了導致test_iter_A, test_iter_B沒有這筆,這應該是校正處理上的設置問題,可以麻煩你幫我解釋一下原因,還有推薦的解決方法嗎? image

frederickszk commented 2 years ago

哦我大致明白原因了哈哈哈,这是因为我预设的超参数里边,有一个: https://github.com/frederickszk/LRNet/blob/7d8829c45dc2427fea373da578e1f0d7cb2c8737/training/evaluate.py#L19 意思是每60帧作为一个样本。而这里你这个视频最后检测出来只有32帧,所以会出现问题。 具体出现问题的原因是,在执行到下面这行代码的时候: https://github.com/frederickszk/LRNet/blob/7d8829c45dc2427fea373da578e1f0d7cb2c8737/training/utils/data.py#L65 由于这个文件里的行数不足一个block的大小,会直接跳过for循环,但是在这行之前已经将这个video的label读入数据集了,导致一方面数量对不上,另一方面后面读入的数据label全乱掉了。 这个是我之前没碰到过的小bug,感谢报告!我会后面修正这块儿。


关于解决方法,我看这个文件的文件名推测应该是DF数据集里的638_640.mp4对应的特征点文件。这个文件不是因为特征点提取的问题,而是因为他本身就是个坏的样本,你可以看下它数据集里的视频,基本上脸部是个马赛克,所以人脸检测和特征点检测都会失效,所以我在数据集中直接将它排除掉了,因为这个数据对于训练模型来说也没有什么帮助hh 顺带一提,DF的伪造人脸视频数据集里边情况类似的还有三个,我之前一一确认过并且把它们剔除出数据集了,你可以检查下,分别是365、569、638开头的三个视频。因此我提供的打包好的DF的landmark数据集训练集中只有797个(其它的都是800个)