ajinkyaT / Lip_Reading_in_the_Wild_AVSR

Audio-Visual Speech Recognition using Deep Learning
60 stars 21 forks source link

Threshold for Confidence in lip sync error #2

Open omcar17 opened 5 years ago

omcar17 commented 5 years ago

Hello, I am using syncnet to find the lip sync error in the videos, but I am getting very random values of Synchronisation error (Confidence) for both good and bad dubbed videos. I am using pre trained weights already availaible on website. And I am testing it on my own data (made using webcam). What should be the threshold for confidence?

ajinkyaT commented 5 years ago

There isn't any confidence set. A value which is the lowest indicates good dubbed video. Hope it answers your question.

omcar17 commented 5 years ago

Thank you for your quick response. Actually I am confused by seeing the results of this model for genuine and false videos. Here is one sample - For a genuine video, I got AV offset: 0 ,Min dist: 7.623, Confidence: 1.004. For a false video, I got AV offset: -1, Min dist: 7.873,Confidence: 1.151. So how do I conclude if there is any lip sync error in the videos? Duration of both these videos is 7 seconds.

ajinkyaT commented 5 years ago

AV Offset: 0:no lag, >0: audio leads video, <0: audio lacks video Min dist: Euclidean distance between the output of the last layer for audio and video features Confidence: near to zero if video and audio are not correlated i.e they are completely different

For for information please check the paper by original authors: Out of time: automated lip sync in the wild