acetylSv / GST-tacotron

Reproducing Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis (https://arxiv.org/pdf/1803.09017.pdf)
61 stars 4 forks source link

Invalid reference audio? #4

Open hyzhan opened 5 years ago

hyzhan commented 5 years ago

I use pre-trained models and different reference audio, but the resulting audio talks barely change. What could be the reason for this?

acetylSv commented 5 years ago

Maybe the pre-trained model is not converged to a promising point. What kinds of different reference audio clips have you tried?