eric-ai-lab / photoswap

Official implementation of the NeurIPS 2023 paper "Photoswap: Personalized Subject Swapping in Images"
https://photoswap.github.io
MIT License
342 stars 24 forks source link

questions about the code #15

Open nancy6o6 opened 2 months ago

nancy6o6 commented 2 months ago
  1. I found the following code in the modified cross-attention module:

    https://github.com/eric-ai-lab/photoswap/blob/570ca0de4866ed2f01ca274c1d016c0af01c119d/utils.py#L235-L241

which seems to result in the same attn_base and att_replace in swapping_class.replace_self_attention. It seems like self_map_replace_steps should cover self_output_replace_steps. Is it correct?

  1. I have some difficulty understanding the meaning of mask = mask[:1] + mask in the code swapping_class.LocalBlend.get_mask. My understanding is that mask[0] is the mask extracted during the reference (reconstruction) process, and mask[1] is the mask extracted during the editing process. So what does this line of code mean?
  2. I want to confirm whether the four dimensions of qkv in question 1 correspond to [uncond_emb, uncond_emb, source_emb, target_emb]. I would appreciate it if anyone could answer these questions.