How are multiple faces handled in preprocessing ?

Hi and congratulations on your work! I'm trying to reproduce your results and I'm having trouble preprocessing the FF++ dataset using your code. I have calculated the landmarks for each video as per the instructions. When using the extract_faces.py script it fails with error: ValueError: operands could not be broadcast together with shapes (68,2) (136,2) in this line of the crop_patch function. It seems that it expects only one set of landmarks for each frame. But since in FF++ and other datasets there are multiple faces, how is this handled ? The code could run if instead of a numpy array it was a list with one or more (68,2) items but, considering that we want to smooth the landmarks, it would result in very jittery movement since it would take into account multiple faces in different locations. How do you handle this case? Maybe create a .avi video, tracking each face separately and store as vidname_{0}.avi, vidname_{1}.avi, vidname_{2}.avi, etc.

ahaliassos / RealForensics

How are multiple faces handled in preprocessing ? #2