Question about Cross-branch Self-attention

jiawei-liu1103 commented 11 months ago

Hi, thanks for the interesting work! According to the Sec 3.4, W_{Q, K, V} are learnable projection matrices, so does DragonDiffusion need to fine-tune projection matrices for all two branches? Or am I understanding something wrong? If fine-tuning is really needed, is its loss function consistent with that of latent z_t? Thanks!

MC-E commented 11 months ago

No, our method does not require fine-tuning these matrices. Thank you for your question, as there is ambiguity in the expression in the paper. Our intention is that they are part of the SD model. We will correct it in our paper.

jiawei-liu1103 commented 11 months ago

No, our method does not require fine-tuning these matrices. Thank you for your question, as there is ambiguity in the expression in the paper. Our intention is that they are part of the SD model. We will correct it in our paper.

Hi, thanks for your quick reply! I'm sorry i didn't understand. In Figure 2, the K and V of guidance branch replace the K and V of Generation branch. If these two parameters are not fine-tuned, does that mean the initialization parameters of these two branches are different? If so, can you tell us how these two branches are initialized? Or are the W_k and W_v of these two branches the same?

MC-E commented 11 months ago

They are all initialized from the pre-trained SD. No models need to be fine-tuned.

MC-E / DragonDiffusion

Question about Cross-branch Self-attention #10