Open mrgloom opened 5 years ago
The same question, does any body know?
Recently I read “Visual recognition of human communication”, Some information may help. Just GUESS ! Offset is about time offset between video and audio FV: Face Verification ASD: Active Speaker Detection
What does voxceleb2 header fields mean?