Hello, thanks for your great work!
I have been working on this model for a while but I haven't got results as good as reported in your paper. After checking videos in VoxCeleb2 dataset, I found some of them contained audible background noise and were of low quality, while clean reference speech segments are necessary to obtain SDR index.
I'm wondering whether you selected videos of high quality in training and test phase, and how?
Hello, thanks for your great work! I have been working on this model for a while but I haven't got results as good as reported in your paper. After checking videos in VoxCeleb2 dataset, I found some of them contained audible background noise and were of low quality, while clean reference speech segments are necessary to obtain SDR index. I'm wondering whether you selected videos of high quality in training and test phase, and how?