facebookresearch / VisualVoice

Audio-Visual Speech Separation with Cross-Modal Consistency
Other
221 stars 35 forks source link

the pre-trained cross-modal matching models(facial.pth and vocal.pth) #22

Open attutude opened 2 years ago

attutude commented 2 years ago

Hi, thanks for your great work, how can I generate the pretrained cross-modal matching models facial.pth and vocal.pth. I want to train the facial.pth and vocal.pth models on the Voxceleb1 dataset is it possible? How should I do it?