Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
2.28k stars 905 forks source link

Tacotron with M-AILABS Italian Male Voice #479

Open simobrazz opened 4 years ago

simobrazz commented 4 years ago

Hello guys,

first of all thanks for the great work done in this repo.

I am trying to synthetize the M-AILABS Italian male voice using only Tacotron + Griffin-Lim on Mel-Spectrograms but after 30k training iterations I found that I can understand well the voice but it is rather metallic.

Anyone tried to synthetize this voice or have any idea on how to tune hparams for such voice?

Any help is appreciated

Pavel185 commented 4 years ago

Hello, I tried to train this model for M-AILABS french dataset (\fr_FR\male\zeckou\l_ile_mysterieuse). After 100k steps the voice also seems to be a litle be more metallic than the original one (loss is around 0.82). Another problem: sometimes some of the accented characters (like é) are not prononced in a correct way. In another tacotron implementation (https://github.com/keithito/tacotron) one could edit tacotron/text/symbols.py and change the _characters variable to be a string containing the characters in your data (like é,i^...). In the present implementation I did not find such possibility. Thanks in advance for any suggestions.

simobrazz commented 4 years ago

@Pavel185 I have the same problem with Italian characters è, é, à, ì etc. I think you can change the character set by adding a custom cleaner, i.e. a cleaner that doesn't remove special characters. In such a way the CharactersEmbedding should include all the characters we desire. It sounds good for you?

Moreover, I train only the Tacotron part for about 40K iterations reaching a loss of about 0.50. I reached this goal by not training with predict_linear=True that seems to have some convergence problems. I tried to train the WaveNet part separately by I encountered the LossExplosion problem, did you try this training part?

Moreover, I found this thesis very interesting https://www.csd.uoc.gr/~sspl/MSc/Sisamaki.pdf and I tune the hparams of my training in accordance with it. Did you try something different for the hparams?

I hope to share knowledge with you to advance in this project. If you prefer we can change our personal contact to share more useful information (or audio samples).