Closed he-nantian closed 3 weeks ago
Assume the underlying diffusion model is an epsilon-based model (i.e., predicting the noise based on x_t).
For diffusion-based sampling like DDPM/DDIM, it uses the predicted noise to estimate xprev (e.g., x{t-1}) when input x_t. Then, the estimated x_prev is used to predict noise again. Repeating the above process is the multi-step sampling.
For consistency sampling, the noise is directly used to predict the x_0 when input x_t. For multi-step sampling, the predicted x0 is diffused (i.e., adding noise) to x{t-k}, where k is a predefined skip interval. Then, the noised x_{t-k} is used to predict x_0 again. Repeating the above process is the multi-step sampling.
Feel free to reopen this issue.
May I ask how the noise is added in detail? I have read your paper and the original Consistency Models paper but both not yet explained in detail. The Consistency Models paper said they chose the noise in a greedy way, is the same method you used in your work? Thank you very much if you could reply to my question!
Great work! I wonder the workflow of 2/4-step sampling. For 4-step sampling as an example, is the first 3 steps similar with DDPM/DDIM and the last step using the consistency model?