I've been trying to reproduce the enhancement performance on the VoxCeleb2 test set, but the performance of the given pre-trained model was much lower than in the paper.
(I used evaluateSeparation.py from the main directory to evaluate the metrics.)
And when I tried with test_synthetic_script.sh, the outputs were bad for my hearing.
The offscreen noise in the mixture (audio_mixed.wav) was much larger than the voice from what I heard, so I felt that the enhancement would be too difficult for the model.
I have 2 questions regarding this.
Is the pre-trained model in the av-enhancement directory your best model for speech enhancement, not separation?
Is your evaluation done with a mixture of two speeches and an offscreen noise with weight 1?
Isn't it too difficult for the model to separate and enhance at the same time?
Hello, thanks for your great work.
I've been trying to reproduce the enhancement performance on the VoxCeleb2 test set, but the performance of the given pre-trained model was much lower than in the paper. (I used
evaluateSeparation.py
from the main directory to evaluate the metrics.)And when I tried with test_synthetic_script.sh, the outputs were bad for my hearing. The offscreen noise in the mixture (audio_mixed.wav) was much larger than the voice from what I heard, so I felt that the enhancement would be too difficult for the model.
I have 2 questions regarding this.
av-enhancement
directory your best model for speech enhancement, not separation?Thanks in advance.