Closed freecui closed 4 years ago
I didn't trim leading and trailing silence on the training data
Did you use preprocess.py
for your training data? And script/inference_tacotron2.sh
for G&L evaluation? The mel spectrograms for this project range from [-4, 4].
python preprocess.py
bash script/inference_tacotron2.sh
do you mean normalize? yes, I did normalize, and max_abs_value=4 ; but I did that according to the project https://github.com/CorentinJ/Real-Time-Voice-Cloning, and I just modify the symbols.py and 'cleaners' like your project. Is there any problem on preprocessing the audio with Real-Time-Voice-Cloning for chinese? Of cause , I did not use him algining the text and audio
I don't think my trained audios are very standard mandarin, do you think the bad result is related audios ?
130k is too short for wavernn training, please wait till 600k step and be patient.
ok, I will continue training wavernn. what your loss on wavernn? my wavernn loss is more than 2 on 130k steps
my synthesizer loss is 3.4+, and I found the loss didn't decline.
Different mel spectrograms have different loss values. Typically the loss value would be steady at ~600k step.
this is my synthesizer result, is that ok ?
result.zip
this is my vocoder result, and you can hear the result audio (130k_steps_5_gen_batched_target8000_overlap400) is not good, it seems noise, I don't know how to solve it?