Aidenzich / road-to-master

A repo to store our research footprint on AI
MIT License
19 stars 4 forks source link

CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation #50

Open Aidenzich opened 7 months ago

Aidenzich commented 7 months ago

https://arxiv.org/pdf/2310.01407.pdf TBD

Aidenzich commented 7 months ago

Mathematical Equations:

  1. Noise prediction equation:

    \hat{\epsilon}_\theta(z_t, t) = \alpha_t \hat{v}_\theta(z_t, t) + \sigma_t z_t
  2. Signal prediction equation:

    \hat{x}_\theta(z_t, t) = \alpha_t z_t - \sigma_t \hat{v}_\theta(z_t, t)
  3. Adapted diffusion model prediction for latent variable:

    \hat{z}_s = \alpha_s \hat{x}_\theta(z_t, c, t) + \sigma_s \epsilon, \text{ with } z_t = \alpha_t x + \sigma_t \epsilon

Symbol Descriptions in Tables:

Symbol Description
$\hat{\epsilon}_\theta(z_t, t)$ Estimated noise prediction at time $t$ and latent state $z_t$.
$\hat{v}_\theta(z_t, t)$ Parameterized diffusion model output for latent state $z_t$ at time $t$.
$\alpha_t, \sigma_t$ Time-dependent scaling factors in the diffusion process.
$\hat{x}_\theta(z_t, t)$ Estimated signal prediction at time $t$ and latent state $z_t$.
$\hat{z}_s$ Predicted latent variable in the adapted diffusion model.
$\alpha_s, \sigma_s$ Scaling factors in the adapted diffusion model, similar in role to $\alpha_t, \sigma_t$.
$\epsilon$ Noise term sampled from a standard normal distribution.
$z_t$ Latent state at time $t$.
$x$ Original input data to the diffusion process.
$c$ Condition variable in the conditional diffusion model.