Clarification Needed on Diffusion Process

fairchildfzc commented 3 months ago

Hi authors,

Thank you for your fascinating paper. However, I have a question regarding the diffusion process as illustrated in Figure 3 (https://github.com/czg1225/AsyncDiff/blob/main/assets/fig2.png?raw=true).

From the image and the paper, it appears that $x{T-5}$ is generated directly from $x{T-1}$. I am curious about the role of $x{T-2}$ through $x{T-4}$ in the calculation of $x{T-5}$. Are these intermediate steps involved in the generation of $x{T-5}$? If not, does this imply that $x{T-2}$ to $x{T-4}$ can be skipped, eliminating the need to compute them?

Thank you for your clarification.

czg1225 commented 3 months ago

@fairchildfzc , In fact, the output of the UNet is the predicted noise for the current timestep. Essentially, we can only obtain $x{T-5}$ after obtaining $x{T-4}$ and $noise_{T-4}$. The core of our method is that noise predictions at different time steps can now be performed in parallel rather than sequentially. However, the predicted noise at each step still needs to be obtained.

fairchildfzc commented 3 months ago

Thank you very much for your clarification! I did not notice that you use different time embeddings in different pipeline stage.

czg1225 / AsyncDiff

Clarification Needed on Diffusion Process #6