Hi, I am trying to run mellotron on Blizzard2013 dataset, I aligned the audio with some alignment tool, where each resulted audio is about 15-25s long.
However, I am facing parse_output error as
Traceback (most recent call last):
File "train.py", line 286, in <module>
args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
File "train.py", line 210, in train
y_pred = model(x)
File "Desktop/py3_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "Desktop/py3_env/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "Desktop/py3_env/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "Desktop/py3_env/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "Desktop/py3_env/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 5.
Original Traceback (most recent call last):
File "Desktop/py3_env/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "Desktop/py3_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "Desktop/PDAEmotion/mellotron/model.py", line 632, in forward
output_lengths)
File "Desktop/PDAEmotion/mellotron/model.py", line 603, in parse_output
outputs[0].data.masked_fill_(mask, 0.0)
RuntimeError: The expanded size of the tensor (891) must match the existing size (349) at non-singleton dimension 2. Target sizes: [16, 80, 891]. Tensor sizes: [16, 80, 349]
I am reading the paper and know that the actual implementation uses audio that is less than 10s. I just wonder this problem is caused by the length of the audio in my dataset? Or not?
How should I fix this?
Also, I changed some of the code to support multi-GPUs with DataParalle
def load_model(hparams):
device = torch.device('cuda:4')
model = Tacotron2(hparams).to(device)
if hparams.fp16_run:
model.decoder.attention_layer.score_mask_value = finfo('float16').min
if torch.cuda.device_count() > 1:
model = DataParallel(model, device_ids=[4, 5])
return model
Hi, I am trying to run mellotron on Blizzard2013 dataset, I aligned the audio with some alignment tool, where each resulted audio is about 15-25s long.
However, I am facing parse_output error as
I am reading the paper and know that the actual implementation uses audio that is less than 10s. I just wonder this problem is caused by the length of the audio in my dataset? Or not?
How should I fix this?
Also, I changed some of the code to support multi-GPUs with DataParalle
Thank you.