Open CCRcmcpe opened 1 year ago
I'm not familiar with those maths related to diffusion models (e.g. how to compute x from ϵ, seems varies across different implementations and samplers), so I'm not sure about whether they are equivalent effect wise. Further discussions welcomed.
Bumping this to hopefully give further attention. There is an implementation in https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/1695 that maybe could be referenced, not sure if that helps. Unfortunately I'm not familiar enough with this either to give more insight.
This seems to be equivalent to equation 11 in the linked paper, minus the support for per-cond weighting. $\epsilonu + \sum{i \in 1..n}{w (\epsilon_i - \epsilon_u)}$ is the same as $\epsilonu + w \sum{i \in 1..n}{(\epsilon_i - \epsilon_u)}$, where $\epsilon_u$ is uncond, and $\epsilon_i$ is a positive prompt cond.
IIUC, algorithm 1 covers the maths for diffusion models. It is not directly related to composable diffusion.
Is there an existing issue for this?
What happened?
https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/blob/45abaa4395c7f427e9207bba2b7aabd440709922/composable_diffusion/composable_stable_diffusion/pipeline_composable_stable_diffusion.py#L539
In official implementation (and in paper, see Algorithm 1), the combination happens on ϵ (i.e. the predicted noise).
https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/22bcc7be428c94e9408f589966c2040187245d81/modules/sd_samplers_kdiffusion.py#L160
In current implementation, the combination happens on x (i.e. the scores given by sampler after a step), this may degrade output quality.
Commit where the problem happens
22bcc7be428c94e9408f589966c2040187245d81