begeekmyfriend / tacotron2

Forked from NVIDIA/tacotron2 and merged with Rayhane-mamah/Tacotron-2
BSD 3-Clause "New" or "Revised" License
81 stars 38 forks source link

how to improve tts result #7

Closed freecui closed 4 years ago

freecui commented 4 years ago

step-140000-mel-spectrogram step-140000-align this is my synthesizer result, is that ok ?

result.zip

this is my vocoder result, and you can hear the result audio (130k_steps_5_gen_batched_target8000_overlap400) is not good, it seems noise, I don't know how to solve it?

freecui commented 4 years ago

I didn't trim leading and trailing silence on the training data

begeekmyfriend commented 4 years ago

Did you use preprocess.py for your training data? And script/inference_tacotron2.sh for G&L evaluation? The mel spectrograms for this project range from [-4, 4].

python preprocess.py
bash script/inference_tacotron2.sh
freecui commented 4 years ago

do you mean normalize? yes, I did normalize, and max_abs_value=4 ; but I did that according to the project https://github.com/CorentinJ/Real-Time-Voice-Cloning, and I just modify the symbols.py and 'cleaners' like your project. Is there any problem on preprocessing the audio with Real-Time-Voice-Cloning for chinese? Of cause , I did not use him algining the text and audio

freecui commented 4 years ago

I don't think my trained audios are very standard mandarin, do you think the bad result is related audios ?

begeekmyfriend commented 4 years ago

130k is too short for wavernn training, please wait till 600k step and be patient.

freecui commented 4 years ago

ok, I will continue training wavernn. what your loss on wavernn? my wavernn loss is more than 2 on 130k steps

freecui commented 4 years ago

my synthesizer loss is 3.4+, and I found the loss didn't decline.

freecui commented 4 years ago

微信图片_20191216175500

begeekmyfriend commented 4 years ago

Different mel spectrograms have different loss values. Typically the loss value would be steady at ~600k step.