crowsonkb / k-diffusion

Karras et al. (2022) diffusion models for PyTorch
MIT License
2.26k stars 372 forks source link

(Missing) Loss weights for the Diffusion Loss? #45

Closed mbreuss closed 1 year ago

mbreuss commented 1 year ago

Hi,

thank you for the great work! I have a general question regarding the loss weighting for the k-diffusion variant after reading the related paper from Karras etl al., (2022):

The loss for training the diffusion model does not have any additional scaling https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/layers.py#L31, while there exist additional Loss scaling in the k-diffusion model, which is described in Table 1 of the paper: $$\lambda(\sigma) = ( - \sigma^{2} + \sigma{\text{data}} ) / (\sigma * \sigma{\text{data}})^2$$

Did I miss it somewhere else in the code or is there a reason for not using it? Thanks!

Edit:

My mistake, it is actually there: The general loss of the diffusion model is defined in Eq. (2) in the paper: $$E{y \sim p{\text{data}}}E{n \sim \mathbb{N}(0, \sigma^2 \mathbf{I})}\lvert \lvert \rvert D(\mathbf{y} + \mathbf{n} -\mathbf{y}) \rvert{2}$$ The loss computation of https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/layers.py#L31 does not compute $$D{\theta}(x, \sigma) = c{\text{skip}} (\sigma) \mathbf{x} + c{\text{out}} (\sigma) F{\theta}(c{in}(\sigma)x; c{\text{noise}})$$ and uses the inner network output instead: $$F{\theta}(c{in}(\sigma)x; c{\text{noise}}))$$ By using this version we can compute the loss, where the weighting term is cancelled out see Eq. (8) from the paper: $$\mathbb{E}{\sigma, \mathbf{y}, \mathbf{n}}[ \lambda (\sigma) a ]$$ with:

$$a = c{\text{out}}(\sigma)^2 \lvert \lvert F{\theta}(c{\text{in}}(\sigma)(\mathbf{y} + \mathbf{n}); c{\text{noise}} - \frac{1}{c{\text{out}}}(\mathbf{y} -c{\text{skip}}(\sigma)(\mathbf{y} + \mathbf{n} )) \rvert \rvert^2_{2}$$ So no scaling is needed in the code.