NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
854 stars 184 forks source link

Inference error #85

Closed BabaiLi closed 3 years ago

BabaiLi commented 3 years ago

When I run inference.ipynb, I got a error in the In[16]:

Style Transfer (Rhythm and Pitch Contour) with torch.no_grad():

get rhythm (alignment map) using tacotron 2

mel_outputs, mel_outputs_postnet, gate_outputs, rhythm = mellotron.forward(x) rhythm = rhythm.permute(1, 0, 2)


RuntimeError Traceback (most recent call last)

in 1 with torch.no_grad(): 2 # get rhythm (alignment map) using tacotron 2 ----> 3 mel_outputs, mel_outputs_postnet, gate_outputs, rhythm = mellotron.forward(x) 4 rhythm = rhythm.permute(1, 0, 2) /datas/mellotron/model.py in forward(self, inputs) 600 601 embedded_inputs = self.embedding(inputs).transpose(1, 2) --> 602 embedded_text = self.encoder(embedded_inputs, input_lengths) 603 embedded_speakers = self.speaker_embedding(speaker_ids)[:, None] 604 embedded_gst = self.gst(targets, output_lengths) /usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 545 result = self._slow_forward(*input, **kwargs) 546 else: --> 547 result = self.forward(*input, **kwargs) 548 for hook in self._forward_hooks.values(): 549 hook_result = hook(self, input, result) /datas/mellotron/model.py in forward(self, x, input_lengths) 186 def forward(self, x, input_lengths): 187 for conv in self.convolutions: --> 188 x = F.dropout(F.relu(conv(x)), drop_rate, self.training) 189 190 x = x.transpose(1, 2) /usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 545 result = self._slow_forward(*input, **kwargs) 546 else: --> 547 result = self.forward(*input, **kwargs) 548 for hook in self._forward_hooks.values(): 549 hook_result = hook(self, input, result) /usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py in forward(self, input) 90 def forward(self, input): 91 for module in self._modules.values(): ---> 92 input = module(input) 93 return input 94 /usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 545 result = self._slow_forward(*input, **kwargs) 546 else: --> 547 result = self.forward(*input, **kwargs) 548 for hook in self._forward_hooks.values(): 549 hook_result = hook(self, input, result) /datas/mellotron/layers.py in forward(self, signal) 34 35 def forward(self, signal): ---> 36 conv_signal = self.conv(signal) 37 return conv_signal 38 /usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 545 result = self._slow_forward(*input, **kwargs) 546 else: --> 547 result = self.forward(*input, **kwargs) 548 for hook in self._forward_hooks.values(): 549 hook_result = hook(self, input, result) /usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py in forward(self, input) 198 _single(0), self.dilation, self.groups) 199 return F.conv1d(input, self.weight, self.bias, self.stride, --> 200 self.padding, self.dilation, self.groups) 201 202 RuntimeError: Calculated padded input size per channel: (4). Kernel size: (5). Kernel size can't be greater than actual input size

This seems to be self.conv(signal) wrong in the mellotron/layers.py, but I didn't change anything before I run the code. Can you give me some suggestions? And my dataset is thchs30, it is a mardarin corpus.

BabaiLi commented 3 years ago

The problem has been solved