facebookresearch / svoice

We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
Other
1.26k stars 181 forks source link

Which one should be specified in mix_json in debug.yaml? #42

Closed srdfjy closed 3 years ago

srdfjy commented 3 years ago

Hi,I have generated the following data set, the speakers in train, valid, and test are all independent,which one should be specified in mix_json in debug.yaml?

egs/test/ ├── cv │   ├── mix.json │   ├── s1.json │   └── s2.json ├── tr │   ├── mix.json │   ├── s1.json │   └── s2.json └── tt ├── mix.json ├── s1.json └── s2.json

adiyoss commented 3 years ago

Hi @srdfjy, The mix_json parameter in the debug.yaml (or any other dataset config) will be used for separating mixtures every 10 epochs (you can modify it using the eval_every parameter in the config file), so you can listen to the samples. If your validation set is not so big you can use the mix.json file from the cv folder. In case your validation set is big, I suggest creating a small subset of it and use that for the mix_json param, so it won't make the training process much slower.

srdfjy commented 3 years ago

@adiyoss thanks