About paper - Githubissues

YU-SHAO-XU commented 2 years ago

HI, here are some qusetions about paper.

1.The predicted point with a large difference between its original point and backward LK point will be discarded (dotted arrows). so how can we predict it ? the blue circle. Inked1651327110091_LI

2.Usually deepfake videos are more noise , so is the denoising step also used for fake videos? Is there influence to classify fake videos?

3.Because you only show the results of part of the best-performance methods.So ff++'s testing data 99.9 is only using (DF)? It not average of （DF、NT、FS、F2F）? but other methods's acc are average of （DF、NT、FS、F2F）?

Please correct me if I have misunderstood!! thank you so much~~

frederickszk commented 2 years ago

Q1: The backward LK are calculated in the same way as the forward LK, i.e. in forward LK we use frame [i] to predict [i+1], while in the backward LK we use [i+1] to predict [i] using the same algorithm, just swap the frame's order.

Q2: Yes. The denoising step is done for both real and fake videos. In fact, our proposed calibrator is carried out after the general pre-processing procedure (face detection, landmark detection, and alignment), which would be adopted by any other Deepfakes detector. But they mostly use the images part while we select the landmarks. As you said, the Deepfake video is noisier which may influence the landmark's extraction. We've observed this phenomenon exactly and that's why we designed the calibrator. The calibrator is based on the image pixel, to refine the landmark's position. For example, in the frame [i], there is one point on the eye corner, while in the next frame [i+1], the landmark detector may be influenced by Deepfakes' noise and give a landmark point a little far away from the eye corner, our calibrator would calculate based on the image patch and pull it to the right place again. That is we try to alleviate the noise caused by landmark extraction, and only preserve the Deepfake's noise (discontinuity, etc.) for better discrimination. Our ablation study supported it.

Q3: Yes, when we submit the paper (around 2020.11), some research including ours still use FF++ to refer to the single DF dataset. And all the performance in paper marked as FF++ is in fact only DF but not the average of the (DF, NT, FS, F2F). Considering that recent work usually uses the whole FF++ as a single dataset, we also update this training strategy and preliminarily verified that it can achieve similar performance on the whole FF++ ( you could check the model weights in our training codes). We will keep updating the model and improve the evaluations. In general, if you want to compare the performance with the initial version LRNet, the reported results in the paper are on the single DF dataset from the current viewpoint. Sorry for causing the misunderstanding~

YU-SHAO-XU commented 2 years ago

Re Q1: At the end of 3.2.1 Tracking explains that LK operation is not always successful. So how could you track the point that if you discard the predicted point with a large difference between its original point and backward LK point. As the right eye corner in the bottom picture!! It wouldn't be passed to step2 kalmen filter.

Re Q2、Q3: Okay, I got it.

frederickszk commented 2 years ago

Yes, if the predicted point is discarded, we would not pass it to the step2 Kalman filter, but use its original detected result as the output. That is, we overlook this point when refining the frame [i+1] points. Noted that frame [i+1] also has detected points from the landmark detector. Besides, our calibrator works in a "one-by-one" approach. We firstly use [i] to calibrate [i+1]. Although the right eye corner failed in this round, when we then use [i+1] to calibrate [i+2] we can still use its original detected point for calculation (while others would be refined points). In our codes, you can check here: https://github.com/frederickszk/LRNet/blob/61e4da6e6b549e3b52adce15a0876b1353cedb5f/calibrator/LC.py#L131 this array is used to control if discard the point.

YU-SHAO-XU commented 2 years ago

@frederickszk I got it thank you~

YU-SHAO-XU commented 2 years ago

Hi

Could you provide DF、NT、FS、F2F test results individually ??

frederickszk commented 2 years ago

Hello~ The performance may be unstable because I‘m still updating the model's structure recently. Therefore I haven't performed comprehensive and precise evaluations. But I could report the results of the provided weights in .\training\weights\xx_all.pth. They are trained on the whole FF++ and separately evaluated on DF, NT, F2F, FS:

Dataset	Acc(%)	Auc(%)
DF	92.50	99.0
NT	90.0	95.87
F2F	92.0	98.69
FS	92.25	98.97

This may not be the definite result. But it could partly represent the performance of the old version LRNet (in 2021's paper)
If you want to obtain the results by other schemes (such as training on DF and testing on DF), you could use the provided training codes and easily reproduce them.

YU-SHAO-XU commented 2 years ago

Hi

The DF's AUC Isn't 99.9% ?

frederickszk commented 2 years ago

Yes. The 99.9% AUC is the performance of the older version that was only trained on DF and tested on DF. Now the provided weights are for the whole FF++ dataset, therefore on the single DF dataset, the AUC would drop slightly. Of course, there is room for improvements.

frederickszk / LRNet

About paper #15