Closed adfost closed 4 years ago
@adfost can you share ur tensorboard ?
@adfost What durations are you using for FastSpeech?
@ZDisket I used the program suggested for the durations, extract_duration.py.
@dathudeptrai I was training via the command line, not sure how to do that.
@adfost What Tacotron2 model did you use to extract durations?
@ZDisket examples/tacotron2/extract_duration.py as suggested
@ZDisket @dathudeptrai yesterday, I updated the version of this repository I used, and apparently as a result of that when I try to run this file again, I get the following error
ValueError: Index out of range using input dim 1; input has only 1 dims for '{{node strided_slice_3}} = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=5, ellipsis_mask=0, end_mask=7, new_axis_mask=0, shrink_axis_mask=0](mel_gts, strided_slice_3/stack, strided_slice_3/stack_1, strided_slice_3/stack_2)' with input shapes: [32], [3], [3], [3] and with computed input tensors: input[3] = <1 1 1>.
@ZDisket @dathudeptrai yesterday, I updated the version of this repository I used, and apparently as a result of that when I try to run this file again, I get the following error
ValueError: Index out of range using input dim 1; input has only 1 dims for '{{node strided_slice_3}} = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=5, ellipsis_mask=0, end_mask=7, new_axis_mask=0, shrink_axis_mask=0](mel_gts, strided_slice_3/stack, strided_slice_3/stack_1, strided_slice_3/stack_2)' with input shapes: [32], [3], [3], [3] and with computed input tensors: input[3] = <1 1 1>.
I fixed it today. let pull newest code.
thanks for that. I will try to train again. I suspect that part of the problem might be due to the version anyway, so I think that retraining will likely help
Actually I think I caught a bug in your new implementation, you use both mel_length and mel_lengths for what looks like should be the same variable.
This is the model architecture after adding the extra layer btw.
2020-07-22 19:09:06,046 (base_trainer:831) INFO: (Step: 200) train_duration_loss = 0.6136. 2020-07-22 19:09:06,047 (base_trainer:831) INFO: (Step: 200) train_mel_loss_before = 0.4136. 2020-07-22 19:09:06,047 (base_trainer:831) INFO: (Step: 200) train_mel_loss_after = 0.4130.
2020-07-22 19:20:58,829 (base_trainer:831) INFO: (Step: 2800) train_duration_loss = 0.2757. 2020-07-22 19:20:58,830 (base_trainer:831) INFO: (Step: 2800) train_mel_loss_before = 0.2954. 2020-07-22 19:20:58,831 (base_trainer:831) INFO: (Step: 2800) train_mel_loss_after = 0.2936.
So the loss has decreased, but I'm getting exactly the same error
I am working on fine tuning FastSpeech and MelGAN on a new dataset with similar format to LJSpeech to reduce preprocessing difficulties.
However, I trained both models, and have gotten an error. I suspect the problem to be with FastSpeech, since I used the MelGAN model with a FastSpeech model trained on another dataset, and I got a bad result, but not as bad as the one with the new FastSpeech model. After 1000 epochs, the FastSpeech model gives a result with no signs of progress. Although I cannot expect a good model after 1000 epochs, I can't believe that I would get no real result whatsoever. Maybe this is an issue with the version of TensorflowTTS I am using?
I would like to upload some wav files produce by the model but I can't ...
Any insight would be welcome.