cswry / OSEDiff

132 stars 5 forks source link

About the implementation of the method #10

Open ChenhLiwnl opened 2 months ago

ChenhLiwnl commented 2 months ago

Hello! I've also emailed you the same question but you seem miss it. I've read your paper One-Step Effective Diffusion Network for Real-World Image Super-Resolution, I found it very interesting and tried to reproduce it following your paper. However, I found the pseudo code provided in the appendix (Algorithm1) a little bit confusing. Based on my understanding of the paper, I think the E\phi, E\theta in line2 should be E\phi', E\phi respectively since E\phi is the pretrained model and we shouldn't re-initialize it. E\theta and E\theta' in line13 should also be E\phi, E\phi', and E\theta' in line 14 should be E_\phi', which is consistent with the symbols used in Eq.7. I wonder if I am wrong or right? Thank you! Also, I have another question. Is the frozen regularizer used in VSD loss exactly the pretrained model, i.e. SD2.1? And the trainable regularizer is initialized by the pretrained model with LoRA? Then I think the VSD loss is almost 0 in the beginning of the training? I am not sure if my understanding is correct, please correct me.

theEricMa commented 1 month ago

Hi, thanks for your interest in our work and your question. I'm the third author of the paper, let me address your questions.

  1. Thank you for pointing that out. In line 13, E\theta and E\theta' should also be E\phi, and in line 14, E\phi' should be E_\phi'. We will correct this in the next version.

  2. Yes, the frozen regularizer used in the VSD loss is indeed the pre-trained model, specifically SD2.1-base. The trainable regularizer is initialized with the pre-trained model using LoRA. However, the gradient is not zero at the start of training. According to the official implementation, the classifier guidance scale (cfg) for the pre-trained regularizer is set greater than 1, typically at 7.5, similar to text-to-image generation. In contrast, the cfg for the fine-tuned regularizer is set to 1. This difference makes the VSD loss effective even at the beginning of training. Our experiments show that setting cfg to 7.5 for both pre-trained and fine-tuned regularizers does not yield as good results as following VSD's implementation.

I hope this clarifies your questions!