cientgu / VQ-Diffusion

MIT License
439 stars 43 forks source link

why don't directly predict x_0 in the inference but predict iteratively? #11

Closed PanXiebit closed 2 years ago

PanXiebit commented 2 years ago

https://github.com/cientgu/VQ-Diffusion/blob/37bbcccdd4aef1794dac645128d864a9f69ed985/image_synthesis/modeling/transformers/diffusion_transformer.py?_pjax=%23js-repo-pjax-container#L186 https://github.com/cientgu/VQ-Diffusion/blob/37bbcccdd4aef1794dac645128d864a9f69ed985/image_synthesis/modeling/transformers/diffusion_transformer.py?_pjax=%23js-repo-pjax-container#L240

As shown in line186, you predict x_0 from x_t at any timestep with transformer model. In the line240 for inference, xt -> x{t-1}, you predict x_0 with p(x_0|xt), and then predict x{t-1} using qposterior q(x{t-1}|x_t, x_0) function.

So why don't directly predict the x_0 with p(x_0|x_T) in the inference?

cientgu commented 2 years ago

Because iteratively predicting x_0 could refine the prediction, this is the key of diffusion models.

PanXiebit commented 2 years ago

Thanks a lot, I understand. Since the predicted x_0 based on full noise is not good, so iteratively refine the prediction is necessary.