andrewowens / multisensory

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
http://andrewowens.com/multisensory/
Apache License 2.0
220 stars 60 forks source link

Questions about VoxCeleb2 dataset #36

Open YiyuLuo opened 4 years ago

YiyuLuo commented 4 years ago

Hello, thanks for your great work! I have been working on this model for a while but I haven't got results as good as reported in your paper. After checking videos in VoxCeleb2 dataset, I found some of them contained audible background noise and were of low quality, while clean reference speech segments are necessary to obtain SDR index. I'm wondering whether you selected videos of high quality in training and test phase, and how?

ruizewang commented 4 years ago

Hello, do you know how to use the pre-trained source separation model to eval and test?