Closed LukeB42 closed 6 years ago
Same behavior here:
File "train.py", line 337, in <module>
main(**vars(parser.parse_args()))
File "train.py", line 235, in main
trainer.run(params['epoch_limit'])
File "/home/stephen/src/samplernn-pytorch/trainer/__init__.py", line 57, in run
self.call_plugins('epoch', self.epochs)
File "/home/stephen/src/samplernn-pytorch/trainer/__init__.py", line 44, in call_plugins
getattr(plugin, queue_name)(*args)
File "/home/stephen/anaconda3/lib/python3.6/site-packages/torch/utils/trainer/plugins/monitor.py", line 56, in epoch
stats['epoch_mean'] = epoch_stats[0] / epoch_stats[1]
ZeroDivisionError: division by zero
Duplicate of #10.
The problem is that for validation we discard the last (incomplete) minibatch so it doesn't skew the result, as it might be smaller than the rest and we average the loss over minibatches with equal weights. Specifically, if you only have one minibatch, it tries to average over an empty set, hence division by zero. This could be handled better and we're planning to do that in the near future.
@koz4k thanks for the response but what do you suggest for fixing this myself for the time being?
return
ing if args
is empty doesn't work and wrapping the function body in a try
/ except
causes the program to exit at around the 1,000 exceptions mark.
Sorry, I was wrong - this is related to the size of the training set, not validation set. Either way, the solution is to lower the batch size or use a bigger dataset. I would recommend a bigger dataset, because with such a small one you might not be able to achieve good results anyway.
@koz4k OK, thanks for explaining that.
@koz4k Following your suggestion using
python train.py --exp TEST --frame_sizes 16 4 --n_rnn 2 --dataset custom --batch_size 64
I'm getting the following result:
Traceback (most recent call last):
File "train.py", line 360, in <module>
main(**vars(parser.parse_args()))
File "train.py", line 258, in main
trainer.run(params['epoch_limit'])
File "pytorch-samplernn/trainer/__init__.py", line 56, in run
self.train()
File "pytorch-samplernn/trainer/__init__.py", line 61, in train
enumerate(self.dataset, self.iterations + 1):
File "pytorch-samplernn/dataset.py", line 51, in __iter__
for batch in super().__iter__():
File "/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 188, in __next__
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 96, in default_collate
return torch.stack(batch, 0, out=out)
File "/usr/local/lib/python3.6/site-packages/torch/functional.py", line 64, in stack
return torch.cat(inputs, dim)
RuntimeError: inconsistent tensor sizes at /pytorch/torch/lib/TH/generic/THTensorMath.c:2864
What do you suggest I do to fix this for the time being?
Are you sure that all the .wav
files in your dataset directory have the same duration ?
@comeweber @koz4k Many thanks for your help, both of you. It's now stably training!
Using wav files that're 8 seconds long and --batch_size
of 32. Many thanks.
@LukeB42 Could you share your file structure with custom folder? I cannot use the youtube-dl to generate the training data right now, so I download a audio file myself. Although I have 8 seconds chunks, the training goes wrong with following errors:
Traceback (most recent call last):
File "train.py", line 360, in
You most likely have chunks of not exactly equal length. Many tools for chunking audio files tend to do that. You can use ffmpeg, it cuts the files cleanly. See the downloading script for an example.
This is with PyTorch 0.3.0.post4.