IDEA-Research / HumanSD

[ICCV 2023] The official implementation of paper "HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation"
Apache License 2.0
280 stars 18 forks source link

About the heatmap usage #30

Open resindraburiza opened 10 months ago

resindraburiza commented 10 months ago

Dear the authors of HumanSD.

First of all, I would like to thank you for sharing your work. I have read both the paper and the code and I found some parts that I cannot understand so I would like to ask some questions here.

About the heatmap loss, in the Eq. 6 of paper, it is written that $Wa$ is a weight such that the loss around the area that has high correlation to the input condition has higher priority factors. image

From my understanding from Figure 2, it seems that the heatmap is used for simple multiplication or simple mask. image

But, after checking the code, it seems that the obtained heatmap is not directly used as simple mask. After heatmap is obtained, the heatmap is passed to VAE encoder as shown here: https://github.com/IDEA-Research/HumanSD/blob/c5db29dd66a3e40afa8b4bed630f0aa7ea001880/ldm/models/diffusion/ddpm.py#L2011 After that, the obtained embedding is used to mask the loss here: https://github.com/IDEA-Research/HumanSD/blob/c5db29dd66a3e40afa8b4bed630f0aa7ea001880/ldm/models/diffusion/ddpm.py#L2026

My questions are the following:

  1. Why is it necessary to pass the obtained heatmap to VAE encoder?
  2. Why do you need 1+ in loss_simple = torch.mul(self.get_loss(model_output, target, mean=False),(1+self.pose_loss_weight*back_to_embed_pose_add_weight)).mean([1, 2, 3])
  3. About obtaining the heatmap as shown in this part of the code. https://github.com/IDEA-Research/HumanSD/blob/c5db29dd66a3e40afa8b4bed630f0aa7ea001880/ldm/models/diffusion/ddpm.py#L1998 The way the heatmap is calculated makes the pixel which are greater than threshold has value of zero and otherwise. I thought that the normal way is to assign 1 to pixels where value is greater than threshold. Why the other way is performed here?

I would really appreciate it if you could guide me to understand your work more correctly. Thank you very much.

xiao2mo commented 8 months ago

same question here?

neil0306 commented 4 months ago

same here, any suggestions?