What you can do when you get Max decoder steps reached

NVIDIA / tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

BSD 3-Clause "New" or "Revised" License

5.12k stars 1.39k forks source link

What you can do when you get Max decoder steps reached #504

Open ErfolgreichCharismatisch opened 3 years ago

ErfolgreichCharismatisch commented 3 years ago

Whenever you get the max decoder steps reached in inference, your audio text pairs have errors. You have to train from scratch again with good pairs.

Fennecai commented 3 years ago

might be related to my issue #502 since I get the same error iirc- i still don't know the answer either though, sorry.

CookiePPP commented 3 years ago

https://github.com/NVIDIA/tacotron2/blob/185cd24e046cc1304b4f8e564734d2498c6e2e6f/hparams.py#L59

You can change the max number of steps, however the model can have issues above 1000.

Normally this statement means you have input too much text to the model for inference or your model is not trained well.

ErfolgreichCharismatisch commented 3 years ago

your model is not trained well.

That's what I am referring to.

CookiePPP commented 3 years ago

Ensure you have a large amount of data (30+ minutes)
Use a pretrained model as a base to --warm_start from
Ensure all transcripts in your dataset match the audio perfectly
Remove any files with background noise
Perform inference in batches and filter out spectrograms with poor alignments or use beam search/greedy search style inference with chunks of frames.

If things still fail.

Decrease hop_length if your speaker speaks abnormally fast and retrain the vocoder + tacotron from scratch on a large dataset before transfer-learning OR
Use a different model that uses duration based alignment

Fennecai commented 3 years ago

Ensure you have a large amount of data (30+ minutes)

Use a pretrained model as a base to --warm_start from

Ensure all transcripts in your dataset match the audio perfectly

Remove any files with background noise

Perform inference in batches and filter out spectrograms with poor alignments or use beam search/greedy search style inference with chunks of frames.

If things still fail.

Decrease hop_length if your speaker speaks abnormally fast and retrain the vocoder + tacotron from scratch on a large dataset before transfer-learning OR

Use a different model that uses duration based alignment

this will be helpful for my issue too I'm sure but... I'm honestly new to this and have no clue what some of this means. specifically:

2: where do you put --warm_start and such? how do you use that? 5: can you describe this again on a complete-idiot's level?

CookiePPP commented 3 years ago

@Fennecai https://github.com/NVIDIA/tacotron2#training-using-a-pre-trained-model Example usage is on the README.

5: can you describe this again on a complete-idiot's level?

Not really. This is for the people who can write the inference code more than people who are just trying to use it. But it's a method of filtering out poor results + instability that works quite well.

ErfolgreichCharismatisch commented 3 years ago

that works quite well.

I have to agree. It's good to know that if there's an error, it's not in the original model so the area where problems arise is very limited and under your control. I learnt it the hard way that a fastidiously fostered dataset is the be-all and end-all.

toanil315 commented 2 years ago

@CookiePPP hi, can you tell me what is the maximum duration of a file .wav in dataset. (because some file in my dataset > 11.5s and all file < 12s) Thanks

CookiePPP commented 2 years ago

@toanil315 There is no limit to the maximum duration of your training/validation audio files. The max_decoder_steps is used when generating new audio using Tacotron2 + WaveGlow with new text.

In fact, training on longer audio files will increase the max_decoder_steps that you can use safely with the model. (a dataset with maximum duration of 5s lets you use 1250 max_decoder_steps, a dataset with maximum duration of 10s lets you use 2000 max_decoder_steps)

toanil315 commented 2 years ago

@CookiePPP thanks for reply, have good day