AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
140.3k stars 26.55k forks source link

[Bug]: Composable Diffusion is not aligned with official implementation #9280

Open CCRcmcpe opened 1 year ago

CCRcmcpe commented 1 year ago

Is there an existing issue for this?

What happened?

https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/blob/45abaa4395c7f427e9207bba2b7aabd440709922/composable_diffusion/composable_stable_diffusion/pipeline_composable_stable_diffusion.py#L539

In official implementation (and in paper, see Algorithm 1), the combination happens on ϵ (i.e. the predicted noise).

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/22bcc7be428c94e9408f589966c2040187245d81/modules/sd_samplers_kdiffusion.py#L160

In current implementation, the combination happens on x (i.e. the scores given by sampler after a step), this may degrade output quality.

Commit where the problem happens

22bcc7be428c94e9408f589966c2040187245d81

CCRcmcpe commented 1 year ago

I'm not familiar with those maths related to diffusion models (e.g. how to compute x from ϵ, seems varies across different implementations and samplers), so I'm not sure about whether they are equivalent effect wise. Further discussions welcomed.

catboxanon commented 1 year ago

Bumping this to hopefully give further attention. There is an implementation in https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/1695 that maybe could be referenced, not sure if that helps. Unfortunately I'm not familiar enough with this either to give more insight.

ljleb commented 1 year ago

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/f865d3e11647dfd6c7b2cdf90dde24680e58acd8/modules/sd_samplers_kdiffusion.py#L80

This seems to be equivalent to equation 11 in the linked paper, minus the support for per-cond weighting. $\epsilonu + \sum{i \in 1..n}{w (\epsilon_i - \epsilon_u)}$ is the same as $\epsilonu + w \sum{i \in 1..n}{(\epsilon_i - \epsilon_u)}$, where $\epsilon_u$ is uncond, and $\epsilon_i$ is a positive prompt cond.

IIUC, algorithm 1 covers the maths for diffusion models. It is not directly related to composable diffusion.