astorfi / lip-reading-deeplearning

:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
Apache License 2.0
1.84k stars 321 forks source link

constrastive loss #21

Closed Maxfashko closed 4 years ago

Maxfashko commented 5 years ago

Hi! Thank you for the excellent work and publicly available code. I would like to know how to get a metric that show the out of sync of video and audio on sample? For example, in this paper, the guys display AV offset, Min dist and Confidence. http://www.robots.ox.ac.uk:5000/~vgg/publications/2016/Chung16a/chung16a.pdf

omcar17 commented 5 years ago

Hi! Thank you for the excellent work and publicly available code. I would like to know how to get a metric that show the out of sync of video and audio on sample? For example, in this paper, the guys display AV offset, Min dist and Confidence. http://www.robots.ox.ac.uk:5000/~vgg/publications/2016/Chung16a/chung16a.pdf

Even I am also trying to find same. Any update on this issue?

astorfi commented 5 years ago

@Maxfashko @omcar17 Thank you for your attention ... I did not investigate that in my paper. The paper you brought up does a great job with a simple metric. However, my approach simply shows out of sync is happening between a pair of audio-visual streams.

ModestYjx commented 5 years ago

Hello, when I run the program, there is a format error. How to solve it? 9E1V1ZK`ROV{N4Y5W_US1MP

Xiaokeai18 commented 5 years ago

@yang929604665 @astorfi This is because the np is not writable, and in the recent version of dlib, it may cause this error. Just add the following code above the Error line 99 detections = detector(frame,1):

frame.setflags(write=True)

I had the same problem bofore, and fixed by adding this.

astorfi commented 4 years ago

@Xiaokeai18 Thank you for your suggestion. Can you add a pull request?