Stability-AI / stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
39.04k stars 5.03k forks source link

PLMS sampling is broken #45

Open xiankgx opened 1 year ago

xiankgx commented 1 year ago

Using the 768 v-diffusion model, using prompt "fruit basket".

With DDIM sampling: image

With PLMS sampling: image

zwishenzug commented 1 year ago

Can confirm.

I managed to get a better output by adding in some code from the DDIM version. There's new parts for 'v' prediction like this:

    if self.model.parameterization == "v":
        e_t = self.model.predict_eps_from_z_and_v(x, t, model_output)

    if self.model.parameterization != "v":
        pred_x0 = (x - sqrt_one_minus_at * e_t) / a_t.sqrt()
    else:
        pred_x0 = self.model.predict_start_from_z_and_v(x, t, model_output)

The equivalent isn't in the PLMS code yet.

But I couldn't quite get it fully correct.