Closed taewookim closed 6 years ago
@taewookim
thank you @astorfi Regarding Q3.. have you ever run the model on videos where speakers are speaking in non-English language? The models don't have to be super accurate, but i was wondering if this model was 'good enough' to determine audio spoofing of videos of non-English speakers.
Suppose a spoofer was attempting to bypass a system that uses face and speech recognition. He would hold up a video that contains the victim's face and voice recorded on, say, an ipad. He would be hiding from the detection camera (to defeat facial recognition) and would use his own voice, not the voice from ipad (to defeat the the speech recocognition system).
Simple solution might be to just look at time offset of the words and compare with the time offset of when the lips move. Of course, this isn't perfect, but at least somewhere to start from. Any idea what part of your code I can modify to detect this?
No, I personally did not run it on Non-English dataset but the paper that I mentioned did it (Out of time: automated lip sync in the wild). About the question you are asking, unfortunately, I am not expert.
thanks you
Excuse my complete noob-ness
1) is the model trying to accurately determine if the video (i.e. shape of lips) and audio are sync'ed?
2) Any pre-trained weights I can download to run ?
3) Assuming my Q1 is correct.. has anyone tested to see if this model can accurately detect audio/video synchronization on non-english languages?