facebookresearch / svoice

We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
Other
1.23k stars 178 forks source link

Pre-trained models #1

Open rafaelvalle opened 3 years ago

rafaelvalle commented 3 years ago

Thank you for making this repository public. This model looks much better than the other models it was compared against.

When are you making pre-trained models available on the repo?

adiyoss commented 3 years ago

Hi @rafaelvalle, Unfortunately, since the model was trained using the WSJ dataset, which is not publicly available, we can not legally release pre-trained models. However, if you have access to this dataset you can train it on your own (all relevant details are in the repo). Another option is to train using the librimix dataset (based on librispeech).

jainal09 commented 3 years ago

I dont understand why is this the problem. Because I have seen Nvidia open sourcing their Nemo asr model that was trained on WSJ dataset. - https://ngc.nvidia.com/catalog/models/nvidia:wsj_quartznet_15x5

FarisHijazi commented 3 years ago

If anyone does train a model on any other dataset PLEASE do upload it and help us out, thanks

adiyoss commented 3 years ago

Hi @jainal09, We are working on it! :) I believe we will be able to upload some models soon. Will update of course :)

AlexeyBoiler commented 3 years ago

Hi @jainal09, We are working on it! :) I believe we will be able to upload some models soon. Will update of course :)

Hello. Plan to train librimix dataset (based on librispeech) ?

ostapstephan commented 3 years ago

@adiyoss Is there any update as to when the pre trained model on librispeech will be available? Edit: I was wondering if you could please provide some information about the resources that were required to train the model to completion. Figure 4 in the paper shows the training over 60 hours but does not mention the hardware used. Was this on a single gpu or on hundreds of them?

nshreyasvi commented 3 years ago

Hello, I tried to run the voice separation using the trained models available at https://ngc.nvidia.com/catalog/models/nvidia:wsj_quartznet_15x5 but got the following error in deserialize_model klass = package['class'] KeyError: 'class' Do you know how to fix this error and run voice separation on a custom .wav file?

hevinyu commented 3 years ago

Great job. Expect pre-trained models.

jeffshee commented 3 years ago

@adiyoss Hi, thanks for the repo. Is there any update on pretrained models?

RHTNT commented 9 months ago

A pretrained model would be great :)