ahaliassos / RealForensics

Official code for Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection (CVPR 2022)
MIT License
75 stars 7 forks source link

How are multiple faces handled in preprocessing ? #2

Open spirosbax opened 1 year ago

spirosbax commented 1 year ago

Hi and congratulations on your work! I'm trying to reproduce your results and I'm having trouble preprocessing the FF++ dataset using your code. I have calculated the landmarks for each video as per the instructions. When using the extract_faces.py script it fails with error: ValueError: operands could not be broadcast together with shapes (68,2) (136,2) in this line of the crop_patch function. It seems that it expects only one set of landmarks for each frame. But since in FF++ and other datasets there are multiple faces, how is this handled ? The code could run if instead of a numpy array it was a list with one or more (68,2) items but, considering that we want to smooth the landmarks, it would result in very jittery movement since it would take into account multiple faces in different locations. How do you handle this case? Maybe create a .avi video, tracking each face separately and store as vidname_{0}.avi, vidname_{1}.avi, vidname_{2}.avi, etc.

ahaliassos commented 1 year ago

Hi,

Indeed, the code assumes that each video contains one face that needs to be extracted. For example, in FF++, only the largest face is tracked and extracted (see Appendix A in https://arxiv.org/pdf/1901.08971.pdf). To extract multiple faces, you would need to track each face and produce landmarks with an extra dimension (e.g., the shape of the landmarks would be (3, 68, 2) for three faces in a frame). Then with slight modifications to the code, it would be possible to crop and align the faces.