YanzuoLu / CFLD

[CVPR 2024 Highlight] Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
MIT License
165 stars 11 forks source link

The code of loss function #15

Closed CHNxindong closed 5 months ago

CHNxindong commented 5 months ago

Thanks for your great work and released code!

I have two problems for code in pose_transfer_train.py:

  1. the losses said in paper are reconstruction loss and mse loss: image image image But there is only 1 line code in implementation: image

  2. Why do the pose_img_src and pose_img_tgt are concated for pose encoder? And why do the img_src and img_tgt are concated for input? image image

ButoneDream commented 5 months ago

我也想知道

YanzuoLu commented 5 months ago

I think your second question already corresponds to the answer to your first question. It can be seen that the only difference between eq.(3) and (5) is the target pose. These two target poses correspond to the original image and the target image in the training pair respectively. In the diffusion model, the latents sent to unet during training are the noisy generated target images, which is the noisy form of the groundtruth image. Hope this helps.

CHNxindong commented 5 months ago

I see it. Thanks for your quick and patient reply.