I have been trying to train a fresh Engligh GST-Taco using the LibriTTS dataset which contains (text, wav) pairs. Your implementation of this algorithm seems to be the best out of the publically avaliable ones, and I would love to be able to incorporate it into my project.
So far I have not managed to get it working at all. I have implemented my get_XX_data method, swapped it into the SpeechDataset class and generated an index file for the dataset which maps the wavs to the text pieces.
I have run a fresh train, and there are no runtime errors, however after a decent amount of training, the system does not output anything beyond whitenoise and weird tones.
I don't have much to go on as to what could be causing this, but attached is an image of the attention output you plot, which seems highly strange.
I have not changed any of the hyperparams which could be causing this. I was wondering if you had any ideas as to what could be causing this failure to learn?
Hi There!
I have been trying to train a fresh Engligh GST-Taco using the LibriTTS dataset which contains (text, wav) pairs. Your implementation of this algorithm seems to be the best out of the publically avaliable ones, and I would love to be able to incorporate it into my project.
So far I have not managed to get it working at all. I have implemented my get_XX_data method, swapped it into the SpeechDataset class and generated an index file for the dataset which maps the wavs to the text pieces.
I have run a fresh train, and there are no runtime errors, however after a decent amount of training, the system does not output anything beyond whitenoise and weird tones.
I don't have much to go on as to what could be causing this, but attached is an image of the attention output you plot, which seems highly strange.
I have not changed any of the hyperparams which could be causing this. I was wondering if you had any ideas as to what could be causing this failure to learn?
Any pointers would be greatly appreciated!