I am a starter regarding diffusion models, and have a question about scale factor applied in prepare_diffusion_concat() and ddim_sample()
I understood that signal-to-noise ratio is essential as you mentioned in 4.4.
In the implementation code for noise sampling,
you shifted and scaled x_startbefore q_sampling and shifted/scaled back for the model's diffused input.
However for inference, you divide x_boxes first and scaled back by multiplying.
I thought the division by scale is not needed because the model learns from diffused boxes that is scaled back as in prepare_diffusion_concat(). Even if scaling is needed for predicting noise, I thought that the order should be reversed just like at the noising step to make conditions identical.
Hi, @ShoufaChen thank you for the great work.
I am a starter regarding diffusion models, and have a question about scale factor applied in
prepare_diffusion_concat()
andddim_sample()
I understood that signal-to-noise ratio is essential as you mentioned in 4.4.In the implementation code for noise sampling, you shifted and scaled
x_start
before q_sampling and shifted/scaled back for the model's diffused input.However for inference, you divide
x_boxes
first and scaled back by multiplying. I thought the division byscale
is not needed because the model learns from diffused boxes that is scaled back as inprepare_diffusion_concat()
. Even if scaling is needed for predicting noise, I thought that the order should be reversed just like at the noising step to make conditions identical.Could you give an explanation why the scale is considered in inference stage or the scale is divided for the input?
And one more, why is
self.ddim_sampling_eta
set to 1 for initialization? Shouldn't eta be zero for DDIM?I will appreciate you for any feedbacks.