Closed lalimili6 closed 3 years ago
config=tacotron2_config, pretrained_path=pretrained_path1, training=False, name="tacotron2" )
can you try the newest code ? Based on the tensorboard, seem everything is ok. Maybe you should try extract duration and train fastspeech2 to check if the problem is about Tacotron-2 or dataset.
Hi Dears, Thanks for this project. I train a TTS model on own dataset for Farsi language. I segment an audiobook from one speaker. I create my dataset like ljspeech and use ljspeech preprocessing, using custom symbols #350. The dataset contain 25 hours voice, 12k wave tracks, average time of waves is 5.90 seconds. I train an TTS model follow tacotron2 model and configure in examples and use multiband_melgan vocoder of English ljspeech pretrain model. I use a 1030 (2Gb) gpu for training. I train model until 68k steps. I set batch-size 2 since my gpu memory crash. The decode waves are not good and not understandable here is my alignments pictures. Would you mind suggesting me to improve output waves? (e.g. continue training, train my own vocoder instead of ljspeech vocoder, revise my dataset ,add new voices, use another model like FastSpeech2 [I train a Kaldi model too for alignment], etc). Are there any details about dataset and output, can I share to help?
tensorbord
alignment
perditions output in 68k steps,
decode perdition sentence in 68k steps The generated wave is not good and not understandable.
decode perdition sentence in 68k steps
I use below code to generate wave and pics:
best regards