CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
66.51k stars 9.97k forks source link

where is the classifier free guidance code #832

Open subhashnerella opened 4 months ago

subhashnerella commented 4 months ago

In the read-me file its mentioned sd-v1-4.ckpt: Resumed from sd-v1-2.ckpt. 225k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.

Where is the code for dropping text-conditioning 10% of the time? I am not able to find it.

ddpm.py from line 1012 does not use unconditioned inference.

 def p_losses(self, x_start, cond, t, noise=None):
        noise = default(noise, lambda: torch.randn_like(x_start))
        x_noisy = self.q_sample(x_start=x_start, t=t, noise=noise)
        model_output = self.apply_model(x_noisy, t, cond)

        loss_dict = {}
        prefix = 'train' if self.training else 'val'

        if self.parameterization == "x0":
            target = x_start
        elif self.parameterization == "eps":
            target = noise
        else:
            raise NotImplementedError()

        loss_simple = self.get_loss(model_output, target, mean=False).mean([1, 2, 3])
        loss_dict.update({f'{prefix}/loss_simple': loss_simple.mean()})
karthik-existir commented 1 week ago

I don't think the training is detailed in this repo. If you want to see classifier-free guidance sampling in text-to-image, look here