Closed UdonDa closed 2 years ago
To get from x_0 to x_t you just do x_0 + noise * t
(t is the same as sigma for the Karras forward process), where you draw noise
using torch.randn()
. Then to use the preconditioner, so that the input to the model is variance ~1, you multiply by c_in for that sigma. So the line you wrote is correct. :) Then you would feed in log(sigma) / 4 to another set of Fourier Features to condition the model on the noise level of the conditioning too.
Thanks!! I will try a coniditioning!
Thank you so much for this guidance. I implemented this for img2img sampling for Stable Diffusion.
Hi, @crowsonkb !
I confuse a forward process from x_0 to x_t. Would you teach me? I'd like to implement conditional augmentation in Imagen paper for a super-resolution. It perturbs x_0 and obtains x_t and t (in your implementation, I guess t means sigma).
I know that your implementation uses randomly sigma here and create a noisy x_t samples here.
noised_input = c_in * (input + noise * utils.append_dims(sigma, input.ndim))
as shown in Eq. 7 Then, doesnoised_input
means x_t images, which are created by a Karras' forward diffusion process, right?