davide-coccomini / Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection

Code for Video Deepfake Detection model from "Combining EfficientNet and Vision Transformers for Video Deepfake Detection" presented at ICIAP 2021.
https://dl.acm.org/doi/abs/10.1007/978-3-031-06433-3_19
MIT License
239 stars 60 forks source link

Potential bug for detecting deepfake video if there are multiple people. #7

Closed ltdt-apex closed 3 years ago

ltdt-apex commented 3 years ago

In preprocessing code detect_face.py and extract_crop.py, it seems that the code does not pay attention to the order of faces for each frame, but in the evaluation code test.py, it seems that you did assume the order of faces is consistent in all frames.

Because of that, the code in test.py may result in unexpected behavior like mixing different people's faces into the same group and do the wrong prediction.

Do I misunderstand something? or is it not supposed to deploy on multiple faces video?

davide-coccomini commented 3 years ago

Your observation is correct. It can indeed happen that due to the way MTCNN extracts faces some are mixed up but according to our observations this is quite a rare event especially for the kind of videos you are considering. To mitigate this phenomenon, using a larger number of faces for each subject is a possible strategy, also explored in relation to performance in the paper. In any case, it remains a more pragmatic approach to classify all faces indiscriminately and extract a single value.

ltdt-apex commented 3 years ago

thanks for the answer.