ankanbhunia / PIDM

Person Image Synthesis via Denoising Diffusion Model (CVPR 2023)
https://ankanbhunia.github.io/PIDM
MIT License
483 stars 62 forks source link

About the implementation on multi-scale condition. #41

Open XiaoqiangZhou opened 1 year ago

XiaoqiangZhou commented 1 year ago

Thanks for sharing this great work.

In the paper, you mentioned that "transfer rich multi-scale texture patterns from the source image distribution to the noise prediction"

How ever, in the code, I find that just the last layer feature of the encoder is used for cross attention. As the [-1] means: pose_out = self.cros_attn2(x = xt_feats[-1], cond = pose_feats[-1]).mean([2,3])

Could you please briefly tell me where is the implementation of "multi-scale" feature for cross attention?

XiaoqiangZhou commented 1 year ago

Well, I think the actual main model is class "BeatGANsAutoencModel" instead of class "BeatGANsPoseGuideModel". And the multiscale condition feature is saved in variable "enc_cond_emb" "mid_cond_emb" and "dec_cond_emb". Is it right?