Closed ButoneDream closed 7 months ago
This can be seen as a simple trick to strengthen the pose information. I don't think this is gonna making too much differences but just ease the difficulty of learning pose patterns. And this is a very commonly used strategy and we are also not the first to do so. The previous work PIDM also adopt this as https://github.com/ankanbhunia/PIDM/issues/6. Thanks for your attention to our work : )
in your code :
I am curious about the design choice in the build_pose_img function where it concatenates pose_img and pose_map, resulting in a tensor with 21 channels. My initial expectation was that the function would directly return the pose_img with 3 channels. I am interested in understanding the rationale behind using 21 channels instead.
What is the purpose of concatenating pose_img with pose_map, and how does it benefit the overall model or application?
Another question: what is the difference between these two images(img_src and img_cond)? Which img is used for training?