erprogs / GenConViT

Deepfake Video Detection Using Generative Convolutional Vision Transformer
GNU General Public License v3.0
49 stars 12 forks source link

Mutiple face detected in the inference video #12

Closed linqiu0-0 closed 3 months ago

linqiu0-0 commented 3 months ago

I am reading through the code to see how Genconvit handles multiple people's faces in one frame. There is a potential bug in its inference code for the actual video input inference in the pred_func Line 34. Since it uses if count < len(frames): in the for loop, if there are more faces detected in the earlier frames, it will make faces detected in the later frames be ignored.

For example, if three faces are detected in the first selected frame, the count will be 3, and the later frames will be ignored. Is this only designed for single-person video detection?

erprogs commented 3 months ago

Thank you for pointing that out. It's clearly a bug, and it's not designed for single-person video detection. I'll update the code. Thank you.

It should be something like this, but I haven't tested it though.


def face_rec(frames, p=None, klass=None):
    temp_face = [] 
    count = 0
    mod = "cnn" if dlib.DLIB_USE_CUDA else "hog"

    for _, frame in tqdm(enumerate(frames), total=len(frames)):
        frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
        face_locations = face_recognition.face_locations(
            frame, number_of_times_to_upsample=0, model=mod
        )

        for face_location in face_locations:
            top, right, bottom, left = face_location
            face_image = frame[top:bottom, left:right]
            face_image = cv2.resize(
                face_image, (224, 224), interpolation=cv2.INTER_AREA
            )
            face_image = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)

            temp_face.append(face_image)

    face_array = np.array(temp_face)
    return face_array, len(face_array)