I have some questions about the code with instance masks. In the following codes, it seems that the value of visual_token_masks only depends on the self_att_ind_objs, and the self_att_all_objs is not relevant to the final value of visual_token_masks. So what is meaning of visual_token_masks = self_att_all_objs + self_att_ind_objs?
# get the masks for avoiding information leakage between object patches
visual_token_masks = self_att_all_objs + self_att_ind_objs
# avoid the communications between objects and background
visual_token_masks[self_att_ind_objs < 1.0] = 0.0 # objects-background can not communicate
visual_token_masks[self_att_ind_objs >= 1.0] = 1.0 # binay mask
att_masks_[:,:,:w_h,:w_h] = visual_token_masks.view(B, 1, w_h, w_h)
Thank for your excellent job!
I have some questions about the code with instance masks. In the following codes, it seems that the value of
visual_token_masks
only depends on the self_att_ind_objs, and theself_att_all_objs
is not relevant to the final value ofvisual_token_masks
. So what is meaning ofvisual_token_masks = self_att_all_objs + self_att_ind_objs
?https://github.com/frank-xwang/InstanceDiffusion/blob/dadf0e3b09c2de82bf35b24e3424a14197a29906/ldm/modules/attention.py#L233C1-L240C88