coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.28k stars 4.3k forks source link

ParallelWaveGAN config should be adjusted #1187

Closed iamanigeeit closed 2 years ago

iamanigeeit commented 2 years ago

Hi,

I have tried training with the current default config and it drops the learning rate too fast and the model converges to generate noise. Because when scheduler_after_epoch=False, ExponentialLR with gamma=0.999 will cause the learning rate to reach 0.000 in 10k steps.

Config from the original paper are:

    batch_size=8,
    stft_loss_weight=1.0,
    mse_G_loss_weight=4.0,
    steps_to_start_discriminator=100000,
    lr_gen=0.0001,
    lr_disc=0.00005,
    lr_scheduler_gen="StepLR",
    lr_scheduler_gen_params={"gamma": 0.5, "step_size": 200000, "last_epoch": -1},
    lr_scheduler_disc="StepLR",
    lr_scheduler_disc_params={"gamma": 0.5, "step_size": 200000, "last_epoch": -1},
    scheduler_after_epoch=False,

It is also possible to use ExponentialLR with some float rounding error:

    lr_scheduler_gen="ExponentialLR",  # one of the schedulers from https:#pytorch.org/docs/stable/optim.html
    lr_scheduler_gen_params={"gamma": 0.5**(1/200000), "last_epoch": -1},
    lr_scheduler_disc="ExponentialLR",  # one of the schedulers from https:#pytorch.org/docs/stable/optim.html
    lr_scheduler_disc_params={"gamma": 0.5**(1/200000), "last_epoch": -1},

With more GPU memory the batch_size can be increased and steps reduced.

erogol commented 2 years ago

Good catch. Would you send a PR including these changes?

iamanigeeit commented 2 years ago

@erogol Unfortunately i cloned the repo in Oct and made a bunch of changes along the way... here's the new config file: parallel_wavegan_config.txt

iamanigeeit commented 2 years ago

Sorry i just saw the comments on my PR!

The configs in the original paper are as mentioned earlier

   stft_loss_weight=1.0,  # currently 0.5
   mse_G_loss_weight=4.0,  # currently 2.5
   steps_to_start_discriminator=100000,  # currently 200000
   lr_gen=0.0001,  # currently 0.0002
   lr_disc=0.00005,  # currently 0.0002

Original paper section 4.1.2:

The hyper-parameter λ adv in equation (7) was chosen to be 4.0 based on our preliminary experiments.

Note that the discriminator was fixed for the first 100K steps, and two models were jointly trained afterwards.

The initial learning rate was set to 0.0001 and 0.00005 for the generator and discriminator, respectively.

If the current config works better than i'm ok with it.