CompVis / taming-transformers

Taming Transformers for High-Resolution Image Synthesis
https://arxiv.org/abs/2012.09841
MIT License
5.69k stars 1.13k forks source link

Did lr(learning rate) scheduler was used? #185

Open Maxlinn opened 1 year ago

Maxlinn commented 1 year ago

Hi compvis group, thanks for your overwheming work and i'd love to deploy your method into my project.

However, i found there is no learning rate scheduler(lr scheduler) was used for training. More precisely, the ./taming/models/vqgan.py, VQGAN.configure_optimizer only returns two optimizers. And self.learning_rate is model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr(in main.py), which is a constant.

    def configure_optimizers(self):
        # `self.learning_rate` is assigned from outside the class
        lr = self.learning_rate
        opt_ae = torch.optim.Adam(list(self.encoder.parameters())+
                                  list(self.decoder.parameters())+
                                  list(self.quantize.parameters())+
                                  list(self.quant_conv.parameters())+
                                  list(self.post_quant_conv.parameters()),
                                  lr=lr, betas=(0.5, 0.9))
        opt_disc = torch.optim.Adam(self.loss.discriminator.parameters(),
                                    lr=lr, betas=(0.5, 0.9))
        return [opt_ae, opt_disc], []

That seems strange. To my knowledge, when training a big model from scratch, learning rate should be adjusted with a scheduler, since big lr is needed for the beginning and small lr for further steps.

Did i missunderstand anything? Could you please give me some hint? Thanks in advance!

Maxlinn commented 1 year ago

desperately want to know, especically for training setting with Gumbel Softmax (official config is here and as follows: https://heibox.uni-heidelberg.de/d/2e5662443a6b4307b470/)

sincere thanks to who could offer any help.

model:
  base_learning_rate: 4.5e-06
  target: taming.models.vqgan.GumbelVQ
  params:
    kl_weight: 1.0e-08
    embed_dim: 256
    n_embed: 8192
    monitor: val/rec_loss
    temperature_scheduler_config:
      target: taming.lr_scheduler.LambdaWarmUpCosineScheduler
      params:
        warm_up_steps: 0
        max_decay_steps: 1000001
        lr_start: 0.9
        lr_max: 0.9
        lr_min: 1.0e-06
    ddconfig:
      double_z: false
      z_channels: 256
      resolution: 256
      in_channels: 3
      out_ch: 3
      ch: 128
      ch_mult:
      - 1
      - 1
      - 2
      - 4
      num_res_blocks: 2
      attn_resolutions:
      - 32
      dropout: 0.0
    lossconfig:
      target: taming.modules.losses.vqperceptual.DummyLoss
Maxlinn commented 1 year ago

for model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr, if ngpu is big and bs is big, the learning rate would be very large, is that acceptable?

order-a-lemonade commented 10 months ago

i don't think model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr is a good setting for learning rate, for me, my ngpus=8,bs=3,base_lr=0.0625(default in config), and in this setting my train_loss can't go down. Does someone gets a good result in this learning rate setting?

image

order-a-lemonade commented 10 months ago

suddenly i find that the default lr in config file is 4.5e-6. instead of 0.0625. so maybe this is the question image