IceClear / StableSR

[IJCV2024] Exploiting Diffusion Prior for Real-World Image Super-Resolution
https://iceclear.github.io/projects/stablesr/
Other
2.2k stars 143 forks source link

CFW training #49

Open JackeyDeng opened 1 year ago

JackeyDeng commented 1 year ago

hello, I'm confused about the CFW trianing. Firstly, Latent codes (4D tensors) of HR images is needed to train CFW, but in your paper (4.1. Implementation Details) it says : "Then we adopt the fine-tuned diffusion model to generate the corresponding latent codes Z0 given the above LR images as conditions.", so what is needed to train the CFW? I think the paper is right. So why do we need the HR image latents rather than LR image latents? Secondly, how does fusion_weight work when training? should we select fusion_w randomly in range [0, 1]? But I cannot find it in the code.

IceClear commented 1 year ago

Hi. First, CFW is used for further improve the quality of the latents from diffusion Unet. The info of LR is introduced by the encoder of autoencoder. Second, w is set to 1 during training. We may add this detail in the paper.

JackeyDeng commented 1 year ago

Hi I am still a little confused. Why is the latent of HR image is needed when training CFW rather than the latent of LR image. I think the latter is compatible with the inference process and is more plausible. here is the paper says which I think is right: 2023-07-18 10-35-02 的屏幕截图 but here is what we need to train CFW which I think is wrong: latents └── 00000001.npy # Latent codes (N, 4, 64, 64) of HR images generated by the diffusion U-net, saved in .npy format.