Closed PanXiebit closed 2 years ago
Because iteratively predicting x_0 could refine the prediction, this is the key of diffusion models.
Thanks a lot, I understand. Since the predicted x_0 based on full noise is not good, so iteratively refine the prediction is necessary.
https://github.com/cientgu/VQ-Diffusion/blob/37bbcccdd4aef1794dac645128d864a9f69ed985/image_synthesis/modeling/transformers/diffusion_transformer.py?_pjax=%23js-repo-pjax-container#L186 https://github.com/cientgu/VQ-Diffusion/blob/37bbcccdd4aef1794dac645128d864a9f69ed985/image_synthesis/modeling/transformers/diffusion_transformer.py?_pjax=%23js-repo-pjax-container#L240
As shown in line186, you predict x_0 from x_t at any timestep with transformer model. In the line240 for inference, xt -> x{t-1}, you predict x_0 with p(x_0|xt), and then predict x{t-1} using qposterior q(x{t-1}|x_t, x_0) function.
So why don't directly predict the x_0 with p(x_0|x_T) in the inference?