Closed linqiu0-0 closed 3 months ago
Thank you for pointing that out. It's clearly a bug, and it's not designed for single-person video detection. I'll update the code. Thank you.
It should be something like this, but I haven't tested it though.
def face_rec(frames, p=None, klass=None):
temp_face = []
count = 0
mod = "cnn" if dlib.DLIB_USE_CUDA else "hog"
for _, frame in tqdm(enumerate(frames), total=len(frames)):
frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
face_locations = face_recognition.face_locations(
frame, number_of_times_to_upsample=0, model=mod
)
for face_location in face_locations:
top, right, bottom, left = face_location
face_image = frame[top:bottom, left:right]
face_image = cv2.resize(
face_image, (224, 224), interpolation=cv2.INTER_AREA
)
face_image = cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB)
temp_face.append(face_image)
face_array = np.array(temp_face)
return face_array, len(face_array)
I am reading through the code to see how Genconvit handles multiple people's faces in one frame. There is a potential bug in its inference code for the actual video input inference in the pred_func Line 34. Since it uses
if count < len(frames):
in the for loop, if there are more faces detected in the earlier frames, it will make faces detected in the later frames be ignored.For example, if three faces are detected in the first selected frame, the count will be 3, and the later frames will be ignored. Is this only designed for single-person video detection?