Closed ltdt-apex closed 3 years ago
Your observation is correct. It can indeed happen that due to the way MTCNN extracts faces some are mixed up but according to our observations this is quite a rare event especially for the kind of videos you are considering. To mitigate this phenomenon, using a larger number of faces for each subject is a possible strategy, also explored in relation to performance in the paper. In any case, it remains a more pragmatic approach to classify all faces indiscriminately and extract a single value.
thanks for the answer.
In preprocessing code
detect_face.py
andextract_crop.py
, it seems that the code does not pay attention to the order of faces for each frame, but in the evaluation codetest.py
, it seems that you did assume the order of faces is consistent in all frames.Because of that, the code in
test.py
may result in unexpected behavior like mixing different people's faces into the same group and do the wrong prediction.Do I misunderstand something? or is it not supposed to deploy on multiple faces video?