Cannot reproduce AUC on FF++ test split.

k0l1ka commented 2 years ago

I use python version and requirements.txt specified in this repo to create a python env. For videos from val and test splits only first 110 frames are used for evaluation. I enlarge the face bounding boxes found by RetinaFace by 1.3 times preserving their centers and feed them into FAN to get landmarks. Only one bbox per face is selected - the one with highest model confidence. Then I use your scripts for mouth cropping and AUC calculation for published ckeckpoint "lipforensics_ff.pth".

Video-level AUCs which I get on test split of FaceForensics++ with different compression levels are lower than the scores in Table 4 of your paper: Raw-99.9, HQ-99.7, LQ-98.1 against my results obtained for "lipforensics_ff.pth": Raw-98.9, HQ-98.7, LQ-81.3.

So the first question is - did you use FF++ train set with only one compression level to get published ckeckpoint "lipforensics_ff.pth"?

If the answer is yes and only one compression level was present in the train set, do you have any ideas why the score I get on the FF++ test set with this compression level is at least 1% lower than yours?

k0l1ka commented 2 years ago

P.S. I've read the similar discussion about AUC on CelebDF and took it into account before opening this issue.

ahaliassos commented 2 years ago

Hi,

The provided checkpoint was trained on FF++ with c23 (HQ) compression, so the HQ results should match.

I'm not sure why they do not currently. Do the final mouth crops look reasonable, i.e., matching the examples from the repo?

k0l1ka commented 2 years ago

Yes, the final mouth crops that I get look reasonable. I compared them with videos in the "examples" folder of this repo.

In addition, I have noticed that bounding boxes affect final crops (compared tight bboxes vs enlarged ones) so it would be great if you provided the bbox enlargment script that was used to get scores, reported in Tables 2 and 4 of the paper.

shui-tian-ju-shi commented 4 weeks ago

The robustness experiment for unknown perturbations done in the paper is very classic, but I didn't find the relevant part in the code. Could you please mention it?

ahaliassos / LipForensics

Cannot reproduce AUC on FF++ test split. #9