Open omcar17 opened 5 years ago
There isn't any confidence set. A value which is the lowest indicates good dubbed video. Hope it answers your question.
Thank you for your quick response. Actually I am confused by seeing the results of this model for genuine and false videos. Here is one sample - For a genuine video, I got AV offset: 0 ,Min dist: 7.623, Confidence: 1.004. For a false video, I got AV offset: -1, Min dist: 7.873,Confidence: 1.151. So how do I conclude if there is any lip sync error in the videos? Duration of both these videos is 7 seconds.
AV Offset: 0:no lag, >0: audio leads video, <0: audio lacks video Min dist: Euclidean distance between the output of the last layer for audio and video features Confidence: near to zero if video and audio are not correlated i.e they are completely different
For for information please check the paper by original authors: Out of time: automated lip sync in the wild
Hello, I am using syncnet to find the lip sync error in the videos, but I am getting very random values of Synchronisation error (Confidence) for both good and bad dubbed videos. I am using pre trained weights already availaible on website. And I am testing it on my own data (made using webcam). What should be the threshold for confidence?