crowsonkb / k-diffusion

Karras et al. (2022) diffusion models for PyTorch
MIT License
2.21k stars 371 forks source link

Questions about soft-min-snr loss #98

Open hmicrobe opened 5 months ago

hmicrobe commented 5 months ago

Really nice work! When reading through the paper, I have some questions about the proposed soft-min-snr loss. Would appreciate your feedback on this.

  1. In eq (5) of the hourglass diffusion transformers, it's mentioned that c_out^{-2}(\sigma) is incorporated, however, based on the definition of c_out, eq (5) should be
min(SNR, \gamma) * (\sigma_data^2 + \sigma^2) / (\sigma_data^2 * \sigma^2).
  1. In the implementation: https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/layers.py#L64-L65

The \gamma=4 or 5 proposed in the paper doesn't seem to be used. Am I missing anything here?

stefan-baumann commented 5 months ago

In this code, gamma is hardcoded to depend on sigma_data, with gamma being chosen as gamma = sigma_data^-2. This, combined with the preconditioner compensation, leads to the formula you're seeing.