when we train Conventional Diffusion branch, we usually generate noise randomly->image added noise(x_t)->other condition->unet predict noise->calculate loss
so we want reconstruct x_0 from x_t, then compare face
why in the paper, you use Lightning branch generate images from pure noise? seems have no relationship with conventional branch?
the point is "from pure noise" rather than use "x_t, text, face" to generate image
thank you
when we train Conventional Diffusion branch, we usually generate noise randomly->image added noise(x_t)->other condition->unet predict noise->calculate loss so we want reconstruct x_0 from x_t, then compare face why in the paper, you use Lightning branch generate images from pure noise? seems have no relationship with conventional branch? the point is "from pure noise" rather than use "x_t, text, face" to generate image thank you