questions about the code

I found the following code in the modified cross-attention module:

https://github.com/eric-ai-lab/photoswap/blob/570ca0de4866ed2f01ca274c1d016c0af01c119d/utils.py#L235-L241

which seems to result in the same attn_base and att_replace in swapping_class.replace_self_attention. It seems like self_map_replace_steps should cover self_output_replace_steps. Is it correct?

I have some difficulty understanding the meaning of mask = mask[:1] + mask in the code swapping_class.LocalBlend.get_mask. My understanding is that mask[0] is the mask extracted during the reference (reconstruction) process, and mask[1] is the mask extracted during the editing process. So what does this line of code mean?
I want to confirm whether the four dimensions of qkv in question 1 correspond to [uncond_emb, uncond_emb, source_emb, target_emb]. I would appreciate it if anyone could answer these questions.

eric-ai-lab / photoswap

questions about the code #15