as-ideas / ForwardTacotron

⏩ Generating speech in a single forward pass without any attention!
https://as-ideas.github.io/ForwardTacotron/
MIT License
578 stars 113 forks source link

gen_forward empty TensorList #13

Closed Coayer closed 4 years ago

Coayer commented 4 years ago

Hi, I've carried out the steps in the "Training your own model" section of the readme but can't run gen_forward.py:

`python gen_forward.py --alpha 1 --input_text "this is whatever you want it to be" griffinlim Using device: cuda

Initialising Forward TTS Model...

+----------+--------------+----------+ | Tacotron | Vocoder Type | GL Iters | +----------+--------------+----------+ | 10k | Griffin-Lim | 32 | +----------+--------------+----------+

| Generating 1/1 Traceback (most recent call last): File "genforward.py", line 142, in , m, _ = tts_model.generate(x, alpha=args.alpha) File "/home/user/Documents/vocal_synthesis/models/forwardtacotron.py", line 165, in generate x, = self.lstm(x) File "/home/user/Documents/vocal_synthesis/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/home/user/Documents/vocal_synthesis/venv/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 570, in forward self.dropout, self.training, self.bidirectional, self.batch_first) RuntimeError: stack expects a non-empty TensorList`

I tried running gen_tacotron.py and it ran without error, but the file it produced seemed too long and sounded nothing like speaking.

If it's relevant, I didn't get too far in the 289,000 step section of train_forward but the loss wasn't reducing much anyway.

Thanks :)

cschaefer26 commented 4 years ago

Hi, this sounds as if your tacotron model hasn't built any attention and thus the durations are wrong. Could you check the attention plot of the tacotron tensorboard?

Coayer commented 4 years ago

The attention image is pretty much a solid dark purple with some very faint lines so I think you're right! Should I start training a new model?

cschaefer26 commented 4 years ago

Yeah, you need to retrain the tacotron model. What dataset are you using? Taco has problems to build attention when the snippets begin with pauses.

Coayer commented 4 years ago

I'm using my own dataset which definitely has some pauses at the start of clips. Thanks so much for the tip, I'll give it a shot.