asteroid-team / asteroid

The PyTorch-based audio source separation toolkit for researchers
https://asteroid-team.github.io/
MIT License
2.23k stars 422 forks source link

Inequal size of tensor in asteroid/engine/system.py #352

Closed chaoxiefs closed 3 years ago

chaoxiefs commented 3 years ago

Hi, I come here again and very sorry to bother you. I'm trying to train DCCRN on own dataset but got stuck here:

    main(arg_dic)
  File "test_run_training.py", line 96, in main
    trainer.fit(system)
  File "/home/dccrn/tools/venv/lib/python3.6/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
    result = fn(self, *args, **kwargs)
  File "/home/dccrn/tools/venv/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1064, in fit
    results = self.accelerator_backend.train()
  File "/home/dccrn/tools/venv/lib/python3.6/site-packages/pytorch_lightning/accelerators/dp_backend.py", line 97, in train
    results = self.trainer.run_pretrain_routine(model)
  File "/home/dccrn/tools/venv/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1239, in run_pretrain_routine
    self.train()
  File "/home/dccrn/tools/venv/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 394, in train
    self.run_training_epoch()
  File "/home/dccrn/tools/venv/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 516, in run_training_epoch
    self.run_evaluation(test_mode=False)
  File "/home/dccrn/tools/venv/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 582, in run_evaluation
    eval_results = self._evaluate(self.model, dataloaders, max_batches, test_mode)
  File "/home/dccrn/tools/venv/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 396, in _evaluate
    eval_results = self.__run_eval_epoch_end(test_mode, outputs, dataloaders, using_eval_result)
  File "/home/dccrn/tools/venv/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 490, in __run_eval_epoch_end
    eval_results = model.validation_epoch_end(eval_results)
  File "/home/dccrn/asteroid/asteroid/engine/system.py", line 151, in validation_epoch_end
    avg_loss = torch.stack([x["val_loss"] for x in outputs]).mean()
RuntimeError: stack expects each tensor to be equal size, but got [3] at entry 0 and [2] at entry 2

I'm using my own dataset, wanting to train DCCRN to do denoising. The dataset structure is like[noisy speech, clean speech].

I followed instructions in #278, but with 2 things different:

First I used SI-SNR loss function: loss_func = PITLossWrapper(SingleSrcNegSDR("sisdr"),pit_from='pw_pt')

And due to the denoising task, I deleted mixture in SimpleSystem, so the code would be like:

class SimpleSystem(System):
    def common_step(self, batch, batch_nb, train):
        input, target = batch
        estimate = self(input)
        # The loss function can be something like
        # loss_func = partial(distance, is_complex=some_bool)
        loss = self.loss_func(estimate, target)
        return loss

Would you please tell me how to fix this? Very appreciate!

chaoxiefs commented 3 years ago

Ok...I've solved this problem. It runs finally.

I didn't use drop_last in my Dataloader. But I still don't understand the reason...

jonashaag commented 3 years ago

The reason is that batch sizes may be unequal unless you use drop_last. So if you use a batch size of 3 and have n * 3 + 2 items in the dataset, the last batch will have size 2 and then the val losses will have different sizes, and thus you can't stack them.

We could deal with this better but I'm not sure if it's worth it.