KinglittleQ / GST-Tacotron

A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
MIT License
362 stars 72 forks source link

Unable to train fresh english GST-Taco with LibriTTS dataset #15

Closed shaneacton closed 3 years ago

shaneacton commented 3 years ago

Hi There!

I have been trying to train a fresh Engligh GST-Taco using the LibriTTS dataset which contains (text, wav) pairs. Your implementation of this algorithm seems to be the best out of the publically avaliable ones, and I would love to be able to incorporate it into my project.

So far I have not managed to get it working at all. I have implemented my get_XX_data method, swapped it into the SpeechDataset class and generated an index file for the dataset which maps the wavs to the text pieces.

I have run a fresh train, and there are no runtime errors, however after a decent amount of training, the system does not output anything beyond whitenoise and weird tones.

I don't have much to go on as to what could be causing this, but attached is an image of the attention output you plot, which seems highly strange. image

I have not changed any of the hyperparams which could be causing this. I was wondering if you had any ideas as to what could be causing this failure to learn?

Any pointers would be greatly appreciated!

KinglittleQ commented 3 years ago

Have you checked if the normalized text is correct?