Why the Perception-Refined Decoder can extract specific information like gender, hairstyle and so on ?

YanzuoLu / CFLD

[CVPR 2024 Highlight] Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

MIT License

183 stars 12 forks source link

This has been questioned in our rebuttal period. Here we post our answer for clarification. Thanks for your attention to our work : )

Our PRD learns the prompt embeddings implicitly. While the learnable queries weren't constrained to focus on specific semantics, it's interesting to note that these four (of $Q$=16 in Tab. 1) visualizations that we picked out seemed to align with just the right parts of the human body. We believe that introducing additional prior knowledge or constraints in the PRD might further contribute to better learning human semantics, which is left for our future work.

YanzuoLu / CFLD

Why the Perception-Refined Decoder can extract specific information like gender, hairstyle and so on ? #12