facebookresearch / svoice

We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
Other
1.26k stars 181 forks source link

error while trying to train #25

Closed th3geek closed 3 years ago

th3geek commented 3 years ago

Hi,

I overcame my OOM problem (from #24) while trying to train the included debug set. This was accomplished by setting R=2 and segment=2. I'm now trying to train using the librimix dataset but have encountered the following error:

(svoice) user@system:/media/user/svoice/svoice/svoice$ python train.py sample_rate=16000 dset=libri4mix segment=2 verbose=1
[2021-04-10 17:18:56,680][__main__][INFO] - For logs, checkpoints and samples check /media/user/svoice/svoice/svoice/outputs/exp_dset=libri4mix,sample_rate=16000,segment=2
[2021-04-10 17:18:56,680][__main__][DEBUG] - {'sample_rate': 16000, 'segment': 2, 'stride': 1, 'pad': True, 'cv_maxlen': 8, 'validfull': 1, 'num_prints': 5, 'device': 'cuda', 'num_workers': 5, 'verbose': 1, 'show': 0, 'checkpoint': True, 'continue_from': '', 'continue_best': False, 'restart': False, 'checkpoint_file': 'checkpoint.th', 'history_file': 'history.json', 'samples_dir': 'samples', 'seed': 2036, 'dummy': None, 'pesq': False, 'eval_every': 10, 'keep_last': 0, 'optim': 'adam', 'lr': 0.0005, 'beta2': 0.999, 'stft_loss': False, 'stft_sc_factor': 0.5, 'stft_mag_factor': 0.5, 'epochs': 100, 'batch_size': 4, 'max_norm': 5, 'lr_sched': 'step', 'step': {'step_size': 2, 'gamma': 0.98}, 'plateau': {'factor': 0.5, 'patience': 5}, 'model': 'swave', 'swave': {'N': 128, 'L': 8, 'H': 128, 'R': 2, 'C': 2, 'input_normalize': False}, 'ddp': False, 'ddp_backend': 'nccl', 'rendezvous_file': './rendezvous', 'rank': None, 'world_size': None, 'dset': {'train': '/media/user/svoice/svoice/svoice/egs/libri4mix/tr', 'valid': '/media/user/svoice/svoice/svoice/egs/libri4mix/tr', 'test': '/media/user/svoice/svoice/svoice/egs/libri4mix/tr', 'mix_json': '/media/user/svoice/svoice/svoice/egs/libri4mix/tr/mix.json', 'mix_dir': None}}
[2021-04-10 17:18:57,312][__main__][INFO] - Running on host system
[2021-04-10 17:18:59,374][svoice.solver][INFO] - ----------------------------------------------------------------------
[2021-04-10 17:18:59,374][svoice.solver][INFO] - Training...
[2021-04-10 17:18:59,727][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 118, in main
    _main(args)
  File "train.py", line 112, in _main
    run(args)
  File "train.py", line 93, in run
    solver.train()
  File "/media/user/svoice/svoice/svoice/svoice/solver.py", line 122, in train
    train_loss = self._run_one_epoch(epoch)
  File "/media/user/svoice/svoice/svoice/svoice/solver.py", line 210, in _run_one_epoch
    sources, est_src, lengths)
  File "/media/user/svoice/svoice/svoice/svoice/models/sisnr_loss.py", line 23, in cal_loss
    source_lengths)
  File "/media/user/svoice/svoice/svoice/svoice/models/sisnr_loss.py", line 39, in cal_si_snr_with_pit
    assert source.size() == estimate_source.size()
AssertionError
(svoice) user@system:/media/user/svoice/svoice/svoice$

I've generated the relevant json files for the wavs, created the corresponding config file in the dset\ directory. The only variables I've changed was to set R=2 sample_rate=16000 dset=libri4mix segment=2. I'm considering renting a cloud instance with a GPU that has enough memory to train the model with the proper R and segment values but I'd like to know there isn't going to be any errors like this beforehand.

adiyoss commented 3 years ago

Hi @th3geek, From the comment it seems like there is a mismatch between the sizes of the estimate and target. How many sources do you want to separate to? The default is 2.

th3geek commented 3 years ago

Hi,

Yes, I'm trying to separate 4 sources. I've looked through the config and don't see the option for specifying the number of sources?

adiyoss commented 3 years ago

You are right, I should have been clearer about that. You should set swave.C=4. See here: https://github.com/facebookresearch/svoice/blob/master/conf/config.yaml#L66

th3geek commented 3 years ago

That worked! Thank you.

Suma3 commented 3 years ago

Hi, @adiyoss @th3geek as mentioned in the paper this model can separate an unknown number of sources in overlapped speech. Can you tell how we can do this example:- mix files have a different number of source's mixed audio than what will be the value of swave.C or there is a different approach to solve such problem