TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.82k stars 812 forks source link

How to fine tune FastSpeech/MelGAN #134

Closed adfost closed 4 years ago

adfost commented 4 years ago

I am working on fine tuning FastSpeech and MelGAN on a new dataset with similar format to LJSpeech to reduce preprocessing difficulties.

However, I trained both models, and have gotten an error. I suspect the problem to be with FastSpeech, since I used the MelGAN model with a FastSpeech model trained on another dataset, and I got a bad result, but not as bad as the one with the new FastSpeech model. After 1000 epochs, the FastSpeech model gives a result with no signs of progress. Although I cannot expect a good model after 1000 epochs, I can't believe that I would get no real result whatsoever. Maybe this is an issue with the version of TensorflowTTS I am using?

I would like to upload some wav files produce by the model but I can't ...

Any insight would be welcome.

dathudeptrai commented 4 years ago

@adfost can you share ur tensorboard ?

ZDisket commented 4 years ago

@adfost What durations are you using for FastSpeech?

adfost commented 4 years ago

@ZDisket I used the program suggested for the durations, extract_duration.py.

adfost commented 4 years ago

@dathudeptrai I was training via the command line, not sure how to do that.

ZDisket commented 4 years ago

@adfost What Tacotron2 model did you use to extract durations?

adfost commented 4 years ago

@ZDisket examples/tacotron2/extract_duration.py as suggested

adfost commented 4 years ago

@ZDisket @dathudeptrai yesterday, I updated the version of this repository I used, and apparently as a result of that when I try to run this file again, I get the following error

ValueError: Index out of range using input dim 1; input has only 1 dims for '{{node strided_slice_3}} = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=5, ellipsis_mask=0, end_mask=7, new_axis_mask=0, shrink_axis_mask=0](mel_gts, strided_slice_3/stack, strided_slice_3/stack_1, strided_slice_3/stack_2)' with input shapes: [32], [3], [3], [3] and with computed input tensors: input[3] = <1 1 1>.
dathudeptrai commented 4 years ago

@ZDisket @dathudeptrai yesterday, I updated the version of this repository I used, and apparently as a result of that when I try to run this file again, I get the following error

ValueError: Index out of range using input dim 1; input has only 1 dims for '{{node strided_slice_3}} = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=5, ellipsis_mask=0, end_mask=7, new_axis_mask=0, shrink_axis_mask=0](mel_gts, strided_slice_3/stack, strided_slice_3/stack_1, strided_slice_3/stack_2)' with input shapes: [32], [3], [3], [3] and with computed input tensors: input[3] = <1 1 1>.

I fixed it today. let pull newest code.

adfost commented 4 years ago

thanks for that. I will try to train again. I suspect that part of the problem might be due to the version anyway, so I think that retraining will likely help

adfost commented 4 years ago

Actually I think I caught a bug in your new implementation, you use both mel_length and mel_lengths for what looks like should be the same variable.

adfost commented 4 years ago
Screen Shot 2020-07-22 at 11 52 27 AM

This is the model architecture after adding the extra layer btw.

adfost commented 4 years ago

2020-07-22 19:09:06,046 (base_trainer:831) INFO: (Step: 200) train_duration_loss = 0.6136. 2020-07-22 19:09:06,047 (base_trainer:831) INFO: (Step: 200) train_mel_loss_before = 0.4136. 2020-07-22 19:09:06,047 (base_trainer:831) INFO: (Step: 200) train_mel_loss_after = 0.4130.

adfost commented 4 years ago

2020-07-22 19:20:58,829 (base_trainer:831) INFO: (Step: 2800) train_duration_loss = 0.2757. 2020-07-22 19:20:58,830 (base_trainer:831) INFO: (Step: 2800) train_mel_loss_before = 0.2954. 2020-07-22 19:20:58,831 (base_trainer:831) INFO: (Step: 2800) train_mel_loss_after = 0.2936.

adfost commented 4 years ago

So the loss has decreased, but I'm getting exactly the same error