Open Lne27 opened 8 months ago
Yes, I have the same question. The latent feature of the sub-region is directly cropped and not resized. https://github.com/YangLing0818/RPG-DiffusionMaster/blob/d2a26e9d199253ee49e75d348d4047d416a5b4e8/cross_attention.py#L127-L128 Then, the cropped features are fused with the corresponding positions of the base latent features. https://github.com/YangLing0818/RPG-DiffusionMaster/blob/d2a26e9d199253ee49e75d348d4047d416a5b4e8/cross_attention.py#L129-L133 It seems not resized as the paper say. And I'd like to know why this is done, is it because resize doesn't make sense?
I'd like to ask, during the stage of regional latent space fusion in different areas, is this method really resizing to the corresponding positions? Looking at the code, it seems that only the latent spaces of the corresponding positions in each regional image are fused, which is quite confusing?