Open nav99een opened 2 years ago
This is due to the prenet dropout, the problem is that the network is now dependent on the prenet dropout and does not generalize to it being off. The tacotron2 is hence not deterministic by design (at lest as implemented in this repo).
@PiotrDabkowski, Any specific reason for pre-net dropout being different from a regular dropout?
It disrupts feedback. In mel spec the frames are highly similar, you can just predict the previous frame to do quite well on the loss fn. If you remove the dropout then this can lead to positive feedback. I guess the solution here would be to randomly turn off prenet droput during training.
Also you can use manual seed during inference to get the same results.
Here are useful insights about dropout in PreNet: https://github.com/mozilla/TTS/issues/50
Hi everyone, I am trying to generate speech from text using tacotron2 finetuned on my custom dataset. So, while inferencing, As you can see from the image attached below, I am getting mel-spectograms with different shape and alignments for the same input sequence.
I tried removing the dropout layer in prenet but then model generates no alignment at all. I tried almost all other suggested approaches for the same but it failed. I also tried to perform inference using torch.no_grad(). I don't want to generate new alignments everytime. Please help me out. Thanks in advance!!