Open jh27kim opened 1 year ago
The author train their autoencoders in an adversarial manner following paper "Taming transformers for high-resolution image synthesis" .As well, you can refer to Appendix G for details
I am also wondering about it.
Hi, I think the architecture and weights are those in the latent-diffusion-model from CompVis ( the other repo), and the config and weights are for the KL-f8:
Weights from https://ommer-lab.com/files/latent-diffusion/kl-f8.zip
model:
base_learning_rate: 1.0e-6
target: ldm.models.autoencoder.AutoencoderKL
params:
monitor: "val/rec_loss"
embed_dim: 4
lossconfig:
target: ldm.modules.losses.LPIPSWithDiscriminator
params:
disc_start: 50001
kl_weight: 0.000001
disc_weight: 0.5
ddconfig:
double_z: True
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult: [ 1,2,4,4 ] # num_down = len(ch_mult)-1
num_res_blocks: 2
attn_resolutions: [ ]
dropout: 0.0
Hi,
I am trying to finetune Autoencoder of Stable Diffusion model.
Could you please provide the details of learning rate scheduler and optimizer ?
Thank you