hifigan v1 gradient explosion ？

TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

https://tensorspeech.github.io/TensorFlowTTS/

Apache License 2.0

3.83k stars 813 forks source link

hifigan v1 gradient explosion ？ #547

Closed unparalleled-ysj closed 3 years ago

unparalleled-ysj commented 3 years ago

The number of training steps can be seen from the figure

Some data will also appear a straight line after 100k in the v2 version. Do you have any suggestions?

ZDisket commented 3 years ago

There was a similar issue a bit ago and I replied with this: https://github.com/TensorSpeech/TensorFlowTTS/pull/387#issuecomment-813606701. Have you tried it?

unparalleled-ysj commented 3 years ago

There was a similar issue a bit ago and I replied with this: #387 (comment). Have you tried it?

Okay, let me test it. By the way, should the parameter discriminator_train_start_steps be set to 100k or set to 0 like you

ZDisket commented 3 years ago

@unparalleled-ysj Try 0 first.

jarred1989 commented 3 years ago

@ZDisket @unparalleled-ysj Any good news? I ran into the same problem, so I changed the optimizer and set discriminator_train_start_steps to 0 according to #387, the synthesized waveforms are no longer silence, but they are even worse than mb-melgan.

ZDisket commented 3 years ago

@jarred1989 A while back I came into the conclusion that there is just something wrong with TensorflowTTS HiFi-GAN that can't be alleviated with simple optimizer or config changes, so I settled with adding the MPD to MB-MelGAN, which trains slower but helps increase multispeaker performance, and may make it possible to successfully finetune another model to adapt it to a target speaker, as I am testing out.

jarred1989 commented 3 years ago

@ZDisket Thanks for your opinion, I may try it out later.

chenht2021 commented 3 years ago

make learning rate smaller will be good. I tried on two corpus male and female, use default config, failed; smaller the lr, the result is good.

ZDisket commented 3 years ago

@chenht2010 What learning rate did you use?

chenht2021 commented 3 years ago

only changed the generator: generator_optimizer_params: values: [0.000125, 0.000125, 0.000625, 0.000625, 0.0000625, 0.00003125, 0.000015625, 0.000001]

dathudeptrai commented 3 years ago

only changed the generator: generator_optimizer_params: values: [0.000125, 0.000125, 0.000625, 0.000625, 0.0000625, 0.00003125, 0.000015625, 0.000001]

can you make a pull request to change the default config for better training :D ?

chenht2021 commented 3 years ago

only changed the generator: generator_optimizer_params: values: [0.000125, 0.000125, 0.000625, 0.000625, 0.0000625, 0.00003125, 0.000015625, 0.000001]

can you make a pull request to change the default config for better training :D ?

sure

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.