facebookresearch / svoice

We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
Other
1.23k stars 178 forks source link

RuntimeError: Calculated padded input size per channel: (0). Kernel size: (8). Kernel size can't be greater than actual input size #102

Open intelligenceabhii opened 3 weeks ago

intelligenceabhii commented 3 weeks ago

training the model from the given samples in the repo encountring this error and if i changes the L vaues from the config.py its crashed

/media/wesee/nv_ssd/meta/svoice/train.py:115: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="conf", config_name='config.yaml') /home/wesee/miniforge3/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'config.yaml': Defaults list is missing _self_. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information warnings.warn(msg, UserWarning) /home/wesee/miniforge3/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:415: UserWarning: In config.yaml: Invalid overriding of hydra/job_logging: Default list overrides requires 'override' keyword. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/defaults_list_override for more information.

deprecation_warning(msg) /home/wesee/miniforge3/lib/python3.10/site-packages/hydra/_internal/defaults_list.py:415: UserWarning: In config.yaml: Invalid overriding of hydra/hydra_logging: Default list overrides requires 'override' keyword. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/defaults_list_override for more information.

deprecation_warning(msg) /home/wesee/miniforge3/lib/python3.10/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. ret = run_job( [2024-09-05 14:35:31,039][main][INFO] - For logs, checkpoints and samples check /media/wesee/nvssd/meta/svoice/outputs/exp [2024-09-05 14:35:34,635][main][INFO] - Running on host ubuntu [2024-09-05 14:35:36,478][svoice.solver][INFO] - ---------------------------------------------------------------------- [2024-09-05 14:35:36,479][svoice.solver][INFO] - Training... Input shape to Encoder: torch.Size([4, 32000]) [2024-09-05 14:35:44,215][svoice.solver][INFO] - Train | Epoch 1 | 1/2 | 0.3 it/sec | Loss 24.78669 Input shape to Encoder: torch.Size([4, 32000]) [2024-09-05 14:35:49,408][svoice.solver][INFO] - Train | Epoch 1 | 2/2 | 0.2 it/sec | Loss 20.88289 [2024-09-05 14:35:49,412][svoice.solver][INFO] - Train Summary | End of Epoch 1 | Time 12.93s | Train Loss 20.88289 [2024-09-05 14:35:49,413][svoice.solver][INFO] - ---------------------------------------------------------------------- [2024-09-05 14:35:49,414][svoice.solver][INFO] - Cross validation... Input shape to Encoder: torch.Size([1, 0]) [2024-09-05 14:35:49,591][main][ERROR] - Some error happened Traceback (most recent call last): File "/media/wesee/nv_ssd/meta/svoice/train.py", line 118, in main _main(args) File "/media/wesee/nv_ssd/meta/svoice/train.py", line 112, in _main run(args) File "/media/wesee/nv_ssd/meta/svoice/train.py", line 93, in run solver.train() File "/media/wesee/nv_ssd/meta/svoice/svoice/solver.py", line 131, in train valid_loss = self._run_one_epoch(epoch, cross_valid=True) File "/media/wesee/nv_ssd/meta/svoice/svoice/solver.py", line 195, in _run_one_epoch estimate_source = self.dmodel(mixture) File "/home/wesee/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/wesee/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/media/wesee/nv_ssd/meta/svoice/svoice/models/swave.py", line 252, in forward mixture_w = self.encoder(mixture) File "/home/wesee/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/wesee/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "/media/wesee/nv_ssd/meta/svoice/svoice/models/swave.py", line 281, in forward mixture_w = F.relu(self.conv(mixture)) File "/home/wesee/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/wesee/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/home/wesee/miniforge3/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 310, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/wesee/miniforge3/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 306, in _conv_forward return F.conv1d(input, weight, bias, self.stride, RuntimeError: Calculated padded input size per channel: (0). Kernel size: (8). Kernel size can't be greater than actual input size

how to deal with this