as-ideas / TransformerTTS

🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.
https://as-ideas.github.io/TransformerTTS/
Other
1.13k stars 227 forks source link

duration not predicted correctly #95

Open theAayushbajaj opened 3 years ago

theAayushbajaj commented 3 years ago

I'm training on a custom dataset. The issue is, generated mels (after training forward) aren't equal to the ground truth mels. Due to this WaveRNN could not be trained as some datums would get corrupted during window calculation here.

One corrupted datum looks like this (mel,label pair)

mel_shape = (80, 311) sig_offset = 79200 label shape (ground truth signal) = (77626,) Label window shape = (0,)

See, the sig_offset value exceeds the length of the signal. Is there any mistake on my part or any suggestions?

Branch: master Commit: e4ded5b

cfrancesco commented 3 years ago

Hi, when producing the mels for WaveRNN (assuming you do want to use the predicted rather than the ground truth ones), you could do a validation step, using the ground truth durations. In this case the predicted mel durations will be equal to the ground truth mels.