Diffusion With Offset Noise

Is your feature request related to a problem? Please describe. The current implementation of the diffusion model in generating extreme images is limited by the structure of the noise used in the training loop. The model is trained on noise that is generated by independent and identically distributed (iid) samples, resulting in a 1/N scaling factor for the longest wavelength feature. This limits the exploration of the overall palette of a scene and correlates with other aspects of presentation and composition.

Describe the solution you'd like I would like to request a feature that uses a modified noise structure in the training loop of the diffusion model to allow for greater control over the longest wavelength feature during the generation of extreme images. Specifically, the ability to add a single iid sample that is the same over the entire image to each pixel's iid sample, resulting in a scaling factor that is ~10 times faster than the base distribution.

Describe alternatives you've considered Increasing the number of sampling steps can help the model generate more extreme images, but it is not a drop-in solution and may not always produce the desired results.

Additional context The proposed feature involves modifying the structure of the noise used in the training loop of the diffusion model to allow for greater control over the longest wavelength feature during the generation of extreme images. By adding a single iid sample that is the same over the entire image to each pixel's iid sample, the scaling factor for the longest wavelength feature is ~10 times faster than the base distribution. This approach has been shown to significantly change the behavior of Stable Diffusion and improve the generation of extreme images, without negatively impacting its ability to generate images from the previous distribution.

reference: Diffusion With Offset Noise

ShivamShrirao / diffusers

Diffusion With Offset Noise #221