Open batman47steam opened 5 months ago
Hi, thanks for following our work. Yes, we use the first stage model (the VAE) weight of the Stable Diffusion VAE. You can find the original model weight here, we delete the denoising UNet part since we only care about image compression and reconstruction. Three upsampling and downsampling layers are included in the original model weights, and we do not make any changes to this part.
Hi, thank you for your response. It helps a lot! I find that the training speed on an RTX 3090 GPU with is a bit slow. Is this normal? Is the main overhead during training due to the inference process of the SD-VAE? It seems that latent mapping model itself is relatively lightweight.
Yes, it is normal. The structure of SD-VAE is much more complex than the latent mapping model, so the inference of SD-VAE may take some time. Also, I use deep supervision during model training (you can find the code here), which means the decoding process will repeat 3 times. If you want to train the GMS fast, try to remove deep supervision (set ds_list=['out'], code is here).
Thank you very much for your prompt and helpful response! I get it now.
Yes, it is normal. The structure of SD-VAE is much more complex than the latent mapping model, so the inference of SD-VAE may take some time. Also, I use deep supervision during model training (you can find the code here), which means the decoding process will repeat 3 times. If you want to train the GMS fast, try to remove deep supervision (set ds_list=['out'], code is here).
Hello,thanks for your great work, I would like to ask about an Error when I load weights for training: RuntimeError: Error(s) in loading state_dict for AutoencoderKL: Missing key(s) in state_dict: "encoder.conv_in.weight", "encoder.conv_in.bias", "encoder.down.0.block.0.norm1.weight", "encoder.down.0.block. Is it because I loaded the wrong weights?I downloaded this weight:https://huggingface.co/stabilityai/stable-diffusion-2/resolve/main/768-v-ema.ckpt?download=true
Since the SD team might reconstruct the SD-VAE structure, please try the model weight in this link.
Since the SD team might reconstruct the SD-VAE structure, please try the model weight in this link.
Thanks for your reply, ill try
Hi, Thank you for sharing this work. It has been very inspiring for me. However, I still have some questions regarding the SD-VAE part. Is the SD-VAE directly using the existing Stable Diffusion VAE? The paper seems to mention that the SD-VAE has three corresponding upsampling and downsampling layers. Was this part designed independently? If it needs to be designed independently, how are the SD pretrained weights utilized? Looking forward to your response.