YanzuoLu / CFLD

[CVPR 2024 Highlight] Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
MIT License
165 stars 11 forks source link

Code question about decoder. #30

Closed justinday123 closed 2 weeks ago

justinday123 commented 1 month ago

In your paper, Perception-Refined Decoder uses source image encoder. So, I thought appearance encoder should be used, but in your code you use 'down_block_additional_residuals' which uses pose encoder. Why is it? def forward(self, batched_inputs): mask = batched_inputs["mask"] if "mask" in batched_inputs else None x, features = self.backbone(batched_inputs["img_cond"], mask=mask) up_block_additional_residuals = self.appearance_encoder(features)

    bsz = x.shape[0]
    if self.training:
        bsz = bsz * 2
        down_block_additional_residuals = self.pose_encoder(torch.cat([batched_inputs["pose_img_src"], batched_inputs["pose_img_tgt"]]))
        up_block_additional_residuals = {k: torch.cat([v, v]) for k, v in up_block_additional_residuals.items()}
        # why self.decoder uses pose_encoder?
        c = self.decoder(x, features, down_block_additional_residuals)
YanzuoLu commented 2 weeks ago

https://github.com/YanzuoLu/CFLD/blob/9892b2fd88ec05327b6a825182388f7185daf454/models/decoder.py#L162

pose_query is only for experiments, not enabled finally in the methodology