NVIDIA / flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
https://nv-adlr.github.io/Flowtron
Apache License 2.0
887 stars 177 forks source link

Single word input leads to ValueError: Expected more than 1 spatial element when training, got input size torch.Size([1, 512, 1]) #149

Open Graf-D opened 2 years ago

Graf-D commented 2 years ago

Hello!

I've got the following exception: ValueError: Expected more than 1 spatial element when training, got input size torch.Size([1, 512, 1]) on the model forward pass (z, log_s_list, gate_pred, attn, attn_logprob, mean, log_var, prob = model(mel, spk_ids, txt, in_lens, out_lens, attn_prior))

I've discovered that this exception occurs if the whole text (txt) is a single word (e.g. "what"). I found some mentions of this problem in this repo's issues but no solution was provided.

The full traceback:

  File "/slot/sandbox/nv_tmpfs/d/in/script/0_script_unpacked/s2t_probs/flowtron_score_text.py", line 147, in <module>
    attn_logprob, mean, log_var, prob) = model(
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/slot/sandbox/nv_tmpfs/d/in/script/0_script_unpacked/s2t_probs/flowtron/flowtron.py", line 818, in forward
    text = self.encoder(text, in_lens)
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/slot/sandbox/nv_tmpfs/d/in/script/0_script_unpacked/s2t_probs/flowtron/flowtron.py", line 436, in forward
    F.relu(conv(curr_x)),
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/instancenorm.py", line 57, in forward
    return F.instance_norm(
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2326, in instance_norm
    _verify_spatial_size(input.size())
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2293, in _verify_spatial_size
    raise ValueError("Expected more than 1 spatial element when training, got input size {}".format(size))
ValueError: Expected more than 1 spatial element when training, got input size torch.Size([1, 512, 1])

I understand why processing a sequence of a length 1 through an instance norm causes an exception. However, is there a way to make it work? I mean, maybe I could add some special token to the string or something like that.

lunalulu commented 1 year ago

any progress? thanks