facebookresearch / svoice

We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
Other
1.26k stars 181 forks source link

Training for 10 speakers #63

Closed muhammad-ahmed-ghani closed 2 years ago

muhammad-ahmed-ghani commented 2 years ago

Hi @adiyoss, can we use this model to train for 10 mixtures ? I have tried but it is giving me assertion error.

assert source.size() == estimate_source.size()

I have also changed the swave.C value for number of speakers.

muhammad-ahmed-ghani commented 2 years ago

@adiyoss Any idea how to resolve this ?

adiyoss commented 2 years ago

Hi @Muhammad-Ahmad-Ghani, It seems there is a mismatch between the input and output dimensions. If you set swave.C=10 it will generate 10 different channels. So in that case did you check the dimension of the source signals? maybe you did not load all the data for supervision? can you please check that?

muhammad-ahmed-ghani commented 2 years ago

@adiyoss Yeah I have created dataset for 3, 5 and 10 mixtures. Dimensions are correct and the model is working for 3 and 5 mixed signals but it shows error for 10 mixed signals. Don't know why

mirosakr commented 2 years ago

Salam Alicom @muhammad-ahmed-ghani could you please tell me how to check the dimension of the source signals? I have almost same error.

muhammad-ahmed-ghani commented 2 years ago

WaAlaikum Asalam @mirosakr Error is due to files search expression.

Replace the code here in svoice/data/data.py at line 35

From

re.compile(r's[0-9].json')

To

re.compile(r's[0-9]+.json')

mirosakr commented 2 years ago

@muhammad-ahmed-ghani Thanks for your reply I did as u told me .. but I got this error

File "/kaggle/working/svoice/svoice/data/audio.py", line 78, in getitem num_frames=num_frames)[0] File "/opt/conda/lib/python3.7/site-packages/torchaudio/backend/sox_backend.py", line 56, in load filetype RuntimeError: Offset past EOF

Any help would be appreciated.

muhammad-ahmed-ghani commented 2 years ago

@muhammad-ahmed-ghani Thanks for your reply I did as u told me .. but I got this error

File "/kaggle/working/svoice/svoice/data/audio.py", line 78, in getitem num_frames=num_frames)[0] File "/opt/conda/lib/python3.7/site-packages/torchaudio/backend/sox_backend.py", line 56, in load filetype RuntimeError: Offset past EOF

Any help would be appreciated.

Have you changed the C: NUM_SPEAKERS i.e 10 in my case in conf/config.yaml at line 66?

mirosakr commented 2 years ago

yes I changed it

muhammad-ahmed-ghani commented 2 years ago

yes I changed it

Are you using the same dependencies version mentioned in the README ? If yes than I have to look into it because when I trained it I just changed few things due to my dataset and everything worked fine.

mirosakr commented 2 years ago

yes, I used the same dependencies version mentioned in the README .. what do u want me to send? and I'll do

I tried to change torchaudio version (torchaudio instead of torchaudi0==0.6.0) I got another error:

File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 4364, in _pad return _VF.constant_pad_nd(input, pad, value) RuntimeError: start (0) + length (-1) exceeds dimension size (56000).

muhammad-ahmed-ghani commented 1 year ago

@mirosakr Sorry I haven't replied you back then I was busy with some other stuff. If you still are working on your project and facing issue you can follow this repository svoice_demo. It is based on new pytorch version.