Noise regularization in distilled DMs

Hi,

Thank you for this inspiring work! I have a small question, if I may, regarding the noise regularization process.

In models trained using the DDPM-based scheme (such as the one used in pix2pix-zero), the noise regularization process makes sense, as the UNet is optimized to predict Gaussian noise with mean = 0 and var= 1. However, , in distilled DMs, particularly in ADDs, this kind of loss is not part of the training scheme. As a result, the outputs of the time-distilled UNet in models like SD-turbo and SDXL-turbo do not necessarily follow a Gaussian distribution with the defined parameters [0,1]. In fact, the variance of the model's output decreases as t approaches 0.

Here are some example logs (produced using SD-turbo with 4 inference-steps) to illustrate this:

 timesteps 999.0: noise_pred mean = 0.0030574470292776823   |   noise_pred var = 0.978159487247467 

 25%|██▌       | 1/4 [00:01<00:03,  1.10s/it]

 timesteps 749.0: noise_pred mean = 0.0017684325575828552   |   noise_pred var = 0.9806435704231262 

 50%|████▌       | 2/4 [00:01<00:00,  1.98it/s]

timesteps 499.0: noise_pred mean = -0.0025877265725284815   |   noise_pred var = 0.877240777015686 

 75%|███████▌  | 3/4 [00:01<00:00,  2.91it/s]

 timesteps 249.0: noise_pred mean = -0.0006634604651480913   |   noise_pred var = 0.7107224464416504 

100%|██████████| 4/4 [00:01<00:00,  2.71it/s]

Considering this, do you think noise regularization is still relevant in ADDs?

Thank you in advance for your time!

garibida / ReNoise-Inversion

Noise regularization in distilled DMs #11