Open JackeyDeng opened 1 year ago
Hi. First, CFW is used for further improve the quality of the latents from diffusion Unet. The info of LR is introduced by the encoder of autoencoder. Second, w is set to 1 during training. We may add this detail in the paper.
Hi I am still a little confused. Why is the latent of HR image is needed when training CFW rather than the latent of LR image. I think the latter is compatible with the inference process and is more plausible. here is the paper says which I think is right: but here is what we need to train CFW which I think is wrong: latents └── 00000001.npy # Latent codes (N, 4, 64, 64) of HR images generated by the diffusion U-net, saved in .npy format.
hello, I'm confused about the CFW trianing. Firstly, Latent codes (4D tensors) of HR images is needed to train CFW, but in your paper (4.1. Implementation Details) it says : "Then we adopt the fine-tuned diffusion model to generate the corresponding latent codes Z0 given the above LR images as conditions.", so what is needed to train the CFW? I think the paper is right. So why do we need the HR image latents rather than LR image latents? Secondly, how does fusion_weight work when training? should we select fusion_w randomly in range [0, 1]? But I cannot find it in the code.