facebookresearch / svoice

We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
Other
1.26k stars 181 forks source link

question about evaluation result #13

Closed aod1310 closed 3 years ago

aod1310 commented 3 years ago

Hi everyone.

I trained the model with this repository codes, but I obtained wrong results.

I used only WSJ-2mix dataset you linked.

In your paper, the metric SI-SNRi is 20.1 on 2 speaker task, but when I trained this model, it was only 7.24 at Epoch 40. at the same epoch, train and valid loss is -11.209 and -19.860, I think it is right. and separated sample data was good quality for me to hear. I just wonder why the score is low.

I didn't change the configuration file except for batch-size, just I ran the training source code. Is there anything I'm missing?

Thank you.

adiyoss commented 3 years ago

Hi @aod1310, Are you sure you are evaluating on the right dataset? The valid loss is -SI-SNR, which means that on the valid your SI-SNR is 19.86. I do not recall such a gap between valid and test on the wsj2mix.

aod1310 commented 3 years ago

Hi @aod1310, Are you sure you are evaluating on the right dataset? The valid loss is -SI-SNR, which means that on the valid your SI-SNR is 19.86. I do not recall such a gap between valid and test on the wsj2mix.

Thank you for your answer you are right! I've heard some files of my wsj-2mix test dataset after I read your answer. I found out that there were incorrectly encoded speech files. I think I typed the wrong command for using sox. my mistakes..

Sorry to bother you. and thank you for your kind reply!! :)