Closed FrischJulien closed 2 years ago
Where do you change max_decoder_steps
In the config.json file after training, before inference. (see config.txt attached)
@erogol did you get a chance to look into it, or did anyone face the same issue?
same error here but with Tacotron2-DCA, even i change the max_decoder_steps to 20k it still shows the error. tacotron_train.py
@xettrisomeman How long did you train your model?
i get the error while training, i have not done inference.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
Is this issue fixed meanwhile? Anyhow? I cannot let run my training for longer than one or two hours. Then this problem comes up when the test sentences are (not successfully) generated.
Describe the bug
After training Tacotron2-DDC for about 140k iterations (batch size 30), I am unable to properly synthetize some speech, despite having some very decent audio samples from Eval Audio and Train Audio in the tensorboard. Whatever value I use for max_decoder_steps, I will always reach the limit during inference, and have a synthetized speach with barely a second of properly decoded audio. See the two examples (with max_decoder_steps set to 500 and 10000) below. exemple_max_decoder_steps.zip
To Reproduce
Train Tacotron2-DDC for about 140k steps with the config attached
Run inference through the code below:
model_path="/home/ec2-user/SageMaker/TTS/run-July-22-2022_06+26PM-c44e39d9/checkpoint_140000.pth" output_directory="/home/ec2-user/SageMaker/testouille/TTS_22khz_espeak_al100/" config_path="/home/ec2-user/SageMaker/TTS/run-July-22-2022_06+26PM-c44e39d9/config.json" config.txt
speaker="bernard" output_path=output_directory+"taco_22khz_espeak_al200_80k"+speaker+".wav" !cd ./TTS && python3 ./TTS/bin/synthesize.py \ --text "Les sanglots longs des violons de l'automne, blessent mon coeur d'une langueur monotone." \ --out_path $output_path \ --model_path $model_path \ --config_path $config_path \ --speaker_idx $speaker \ --use_cuda true
Expected behavior
Same level of audio quality that was displayed in Eval Audio and Train Audio in the tensorboard.
Logs
Environment
Additional context
I am using distributed training