CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation

Noise prediction equation:

\hat{\epsilon}_\theta(z_t, t) = \alpha_t \hat{v}_\theta(z_t, t) + \sigma_t z_t

Signal prediction equation:

\hat{x}_\theta(z_t, t) = \alpha_t z_t - \sigma_t \hat{v}_\theta(z_t, t)

Adapted diffusion model prediction for latent variable:

\hat{z}_s = \alpha_s \hat{x}_\theta(z_t, c, t) + \sigma_s \epsilon, \text{ with } z_t = \alpha_t x + \sigma_t \epsilon

Symbol	Description
$\hat{\epsilon}_\theta(z_t, t)$	Estimated noise prediction at time $t$ and latent state $z_t$.
$\hat{v}_\theta(z_t, t)$	Parameterized diffusion model output for latent state $z_t$ at time $t$.
$\alpha_t, \sigma_t$	Time-dependent scaling factors in the diffusion process.
$\hat{x}_\theta(z_t, t)$	Estimated signal prediction at time $t$ and latent state $z_t$.
$\hat{z}_s$	Predicted latent variable in the adapted diffusion model.
$\alpha_s, \sigma_s$	Scaling factors in the adapted diffusion model, similar in role to $\alpha_t, \sigma_t$.
$\epsilon$	Noise term sampled from a standard normal distribution.
$z_t$	Latent state at time $t$.
$x$	Original input data to the diffusion process.
$c$	Condition variable in the conditional diffusion model.

Aidenzich / road-to-master