fatchord / WaveRNN

WaveRNN Vocoder + TTS
https://fatchord.github.io/model_outputs/
MIT License
2.14k stars 698 forks source link

Attention Blank, is it because my progressive training schedule? #223

Open lilmadman007 opened 3 years ago

lilmadman007 commented 3 years ago

my attention is empty after 10k steps, which shouldn't be normal. I'm using LJSpeech dataset. This is the second time I preprocessed everything and trained.

image

Loss is around 1.0 at 10k steps Are my settings wrong here? Does this not work?

image

Thanks!

NOTE: I LOOKED AT THIS ISSUE ALREADY -> https://github.com/fatchord/WaveRNN/issues/154

fatchord commented 3 years ago

Hi, sometimes the alignment will fail randomly. I've never tried with batch size of 8 so that could be it. Maybe try finetuning on one of the pretrained models.

AhmadAlAmin21 commented 3 years ago

did you ever solve this?

lilmadman007 commented 3 years ago

did you ever solve this?

Sorry for the lack of feedback. No I did not. when fatchord commented that it fails sometimes I tried it again 2 more times, but it just didn't work. Maybe my gpu is just not good enough, like I said, but I just moved on when I couldn't get results. Any help would be appreciated anyways!

AhmadAlAmin21 commented 3 years ago

I think i found a solution, 1) increase "r" from 7 to 12 in the tts_schedule in hparams.py. 2) go to models file>tacotron.py, and change line 200 from "scores = torch.sigmoid(u) / torch.sigmoid(u).sum(dim=1, keepdim=True)" to "scores = F.softmax(u, dim=1)".

got this from https://github.com/fatchord/WaveRNN/issues/154#issuecomment-567851857