Thank you for this inspiring work!
I have a small question, if I may, regarding the noise regularization process.
In models trained using the DDPM-based scheme (such as the one used in pix2pix-zero), the noise regularization process makes sense, as the UNet is optimized to predict Gaussian noise with mean = 0 and var= 1. However, , in distilled DMs, particularly in ADDs, this kind of loss is not part of the training scheme. As a result, the outputs of the time-distilled UNet in models like SD-turbo and SDXL-turbo do not necessarily follow a Gaussian distribution with the defined parameters [0,1]. In fact, the variance of the model's output decreases as t approaches 0.
Here are some example logs (produced using SD-turbo with 4 inference-steps) to illustrate this:
timesteps 999.0: noise_pred mean = 0.0030574470292776823 | noise_pred var = 0.978159487247467
25%|██▌ | 1/4 [00:01<00:03, 1.10s/it]
timesteps 749.0: noise_pred mean = 0.0017684325575828552 | noise_pred var = 0.9806435704231262
50%|████▌ | 2/4 [00:01<00:00, 1.98it/s]
timesteps 499.0: noise_pred mean = -0.0025877265725284815 | noise_pred var = 0.877240777015686
75%|███████▌ | 3/4 [00:01<00:00, 2.91it/s]
timesteps 249.0: noise_pred mean = -0.0006634604651480913 | noise_pred var = 0.7107224464416504
100%|██████████| 4/4 [00:01<00:00, 2.71it/s]
Considering this, do you think noise regularization is still relevant in ADDs?
Hi,
Thank you for this inspiring work! I have a small question, if I may, regarding the noise regularization process.
In models trained using the DDPM-based scheme (such as the one used in pix2pix-zero), the noise regularization process makes sense, as the UNet is optimized to predict Gaussian noise with
mean = 0
andvar= 1
. However, , in distilled DMs, particularly in ADDs, this kind of loss is not part of the training scheme. As a result, the outputs of the time-distilled UNet in models like SD-turbo and SDXL-turbo do not necessarily follow a Gaussian distribution with the defined parameters[0,1]
. In fact, the variance of the model's output decreases as t approaches 0.Here are some example logs (produced using SD-turbo with 4 inference-steps) to illustrate this:
Considering this, do you think noise regularization is still relevant in ADDs?
Thank you in advance for your time!
Or