crowsonkb / k-diffusion

Karras et al. (2022) diffusion models for PyTorch
MIT License
2.21k stars 371 forks source link

Stabilizing DPM++2M SDE for SDXL #85

Open LuChengTHU opened 9 months ago

LuChengTHU commented 9 months ago

Hi @crowsonkb , long time no see! I'm opening this issue to discuss the potential improvement for sampling methods with SDXL.

As I listed in https://github.com/crowsonkb/k-diffusion/issues/43, SDXL with DPM++2M will have apparent artifacts due to the numerical instability, especially for SDE solvers.

One possible way is to let the final step be the first-order solver, e.g., sampling with 5 steps will be [1,2,2,2,1] orders instead of [1,2,2,2,2] orders, as discussed in https://github.com/crowsonkb/k-diffusion/issues/43 and I also list more examples in https://github.com/huggingface/diffusers/pull/5541 .

Another possible way is to change the step size scheduler. For example, your implemented Karra's step size scheduler is the most widely-used step size scheduler in the community, and it can significantly improve the sample quality. Recently I find that Karra's step size with $\rho=7$ is much related to my "uniform logSNR" scheduler, which is proposed in the original paper of DPM-Solver.

Specifically, note that the definition of "Karras sigmas" is equivalent to $\alpha_t / \sigma_t = \exp(\lambda_t)$, so the "log sigmas" in Karras' setting is just $\lambda_t$. Moreover, as Karras uses an exponential splitting for sigmas with a hyperparameter , we can prove that when $\rho$ goes to infinity, the step sizes are equivalent to uniform $\lambdat$, because of the definition of the exponential function, $\exp(x) = \lim{\rho \rightarrow \infty} (1 + \frac{x}{\rho})^{\rho}$. As $\rho=7$ is already quite large, the samples by Karras sigmas and my uniform lambdas are similar when using ODE solvers, and both can reduce the discretization errors.

However, for SDE solvers, Karra's step size and my uniform logSNR step size are quite different, due to the Gaussian noise during the trajectory. For example, here is an example for a cat, DPM++2M SDE, steps=25, with SDXL (no refiner):

image

I think the uniform logSNR step size is quite interesting and it can also provide beautiful samples, so it may bring new insights to the community. Could you please also integrate this step size scheduler in your k-diffusion?

The code is quite simple, for example: https://github.com/huggingface/diffusers/pull/5541/commits/892fec9b4314ca5e0ae2cf261494d483f839f572

crowsonkb commented 8 months ago

The uniform in log snr schedule has been in k-diffusion for a long time, it is https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/sampling.py#L26 :)

I intended to implement the first order last step when I first saw this issue but forgot, I will get to it soon (it will be an option that will be off by default).