Closed davidingram123 closed 1 hour ago
Good point! These edge cases aren't very common, but they do happen. I can't check my ANSWER_00845.pkl
file at the moment, but I recall encountering a few similar instances in the training dataset.
This is the reason I chose Mediapipe over other face detectors because it includes a face-tracking feature. By setting max_num_faces = 1
, these errors are significantly reduced compared to other face detectors.
You can see the relevant code here:
https://github.com/KAIST-AILab/SyncVSR/blob/db5e50e9677c815169c0587c17a52f20a50bd7d8/LRW/video/src/preprocess_roi.py#L17-L22
However, there's room for improvement. For instance, setting a max_num_faces = 2
and adding a script to select the centrally positioned face when multiple faces are detected could be beneficial. Currently, in the code block below, we simply select the first predominantly detected face from Mediapipe.
Despite these occasional misdetections, our model's performance is still reachable.
Thank you, I understand.
Sorry to bother you, I have another question. The image above shows the contents of the .pkl file corresponding to "LRW/lipread_mp4/ANSWER/train/ANSWER_00845.mp4". It’s clear that it has extracted the wrong face; the correct face should be the person a bit more to the right. Is this normal? Is it an isolated case? I randomly opened one and found it to be incorrect. Is the .pkl file corresponding to ANSWER/train/ANSWER_00845.mp4 that you extracted also showing the same issue?