Open fairchildfzc opened 3 months ago
@fairchildfzc , In fact, the output of the UNet is the predicted noise for the current timestep. Essentially, we can only obtain $x{T-5}$ after obtaining $x{T-4}$ and $noise_{T-4}$. The core of our method is that noise predictions at different time steps can now be performed in parallel rather than sequentially. However, the predicted noise at each step still needs to be obtained.
Thank you very much for your clarification! I did not notice that you use different time embeddings in different pipeline stage.
Hi authors,
Thank you for your fascinating paper. However, I have a question regarding the diffusion process as illustrated in Figure 3 (https://github.com/czg1225/AsyncDiff/blob/main/assets/fig2.png?raw=true).
From the image and the paper, it appears that $x{T-5}$ is generated directly from $x{T-1}$. I am curious about the role of $x{T-2}$ through $x{T-4}$ in the calculation of $x{T-5}$. Are these intermediate steps involved in the generation of $x{T-5}$? If not, does this imply that $x{T-2}$ to $x{T-4}$ can be skipped, eliminating the need to compute them?
Thank you for your clarification.