Stability-AI / stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
38.83k stars 5.01k forks source link

Autoencoder architecture and loss function #240

Open Arksyd96 opened 1 year ago

Arksyd96 commented 1 year ago

Hello everyone,

I am currently working on a medical imaging project that involves a modified latent diffusion model. The implementation is based on some parts of this repo, but I have some questions about certain aspects of the code that I hope someone can help me with.

Firstly, in the compvis version of the autoencoder, there is a Vector Quantized Autoencoder, but in this repository, there is only the AutoencoderKL. Can someone please clarify whether the KL version is better or if there is a specific reason for using the AutoencoderKL in this implementation?

Secondly, in the code, I noticed the line of code:

self.loss = instantiate_from_config(lossconfig)

However, when I looked at the config files, all I found was lossconfig: target: torch.nn.Identity. I assume that there must be a loss function somewhere since it is called in the following line of code:

aeloss, log_dict_ae = self.loss(inputs, reconstructions, posterior, optimizer_idx, self.global_step, last_layer=self.get_last_layer(), split="train")

Can someone please provide me with more technical details about this loss function? In my understanding, the loss function is composed of MSE, KL divergence term, and a discriminator regularization (Probably multiplied by a small coefficient).

Lastly, I would like to confirm if my understanding is correct that the discriminator is inside the loss function. If so, can someone please explain the technical details of how the discriminator is incorporated into the loss function ? because i do not find its architecture.

PS: If there is any discord server where i can find some devs, and ask some quick questions, it'd be nice ! Thank you very much in advance for your help.