Speech enhancement evaluation

Hello, thanks for your great work.

I've been trying to reproduce the enhancement performance on the VoxCeleb2 test set, but the performance of the given pre-trained model was much lower than in the paper. (I used evaluateSeparation.py from the main directory to evaluate the metrics.)

And when I tried with test_synthetic_script.sh, the outputs were bad for my hearing. The offscreen noise in the mixture (audio_mixed.wav) was much larger than the voice from what I heard, so I felt that the enhancement would be too difficult for the model.

I have 2 questions regarding this.

Is the pre-trained model in the av-enhancement directory your best model for speech enhancement, not separation?
Is your evaluation done with a mixture of two speeches and an offscreen noise with weight 1? Isn't it too difficult for the model to separate and enhance at the same time?

Thanks in advance.

facebookresearch / VisualVoice

Speech enhancement evaluation #29