CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
68.47k stars 10.18k forks source link

Autoencoder training details #766

Open jh27kim opened 1 year ago

jh27kim commented 1 year ago

Hi,

I am trying to finetune Autoencoder of Stable Diffusion model.

Could you please provide the details of learning rate scheduler and optimizer ?

Thank you

zjhJOJO commented 10 months ago

The author train their autoencoders in an adversarial manner following paper "Taming transformers for high-resolution image synthesis" .As well, you can refer to Appendix G for details

venn0605 commented 3 months ago

I am also wondering about it.

hahamidi commented 2 months ago

Hi, I think the architecture and weights are those in the latent-diffusion-model from CompVis ( the other repo), and the config and weights are for the KL-f8:

Weights from https://ommer-lab.com/files/latent-diffusion/kl-f8.zip

model:
  base_learning_rate: 1.0e-6
  target: ldm.models.autoencoder.AutoencoderKL
  params:
    monitor: "val/rec_loss"
    embed_dim: 4
    lossconfig:
      target: ldm.modules.losses.LPIPSWithDiscriminator
      params:
        disc_start: 50001
        kl_weight: 0.000001
        disc_weight: 0.5

    ddconfig:
      double_z: True
      z_channels: 4
      resolution: 256
      in_channels: 3
      out_ch: 3
      ch: 128
      ch_mult: [ 1,2,4,4 ]  # num_down = len(ch_mult)-1
      num_res_blocks: 2
      attn_resolutions: [ ]
      dropout: 0.0