Some questions about the codes relevant to instance mask.

Thank for your excellent job!

I have some questions about the code with instance masks. In the following codes, it seems that the value of visual_token_masks only depends on the self_att_ind_objs, and the self_att_all_objs is not relevant to the final value of visual_token_masks. So what is meaning of visual_token_masks = self_att_all_objs + self_att_ind_objs?

https://github.com/frank-xwang/InstanceDiffusion/blob/dadf0e3b09c2de82bf35b24e3424a14197a29906/ldm/modules/attention.py#L233C1-L240C88

# get the masks for avoiding information leakage between object patches
visual_token_masks = self_att_all_objs + self_att_ind_objs

# avoid the communications between objects and background
visual_token_masks[self_att_ind_objs < 1.0] = 0.0 # objects-background can not communicate
visual_token_masks[self_att_ind_objs >= 1.0] = 1.0 # binay mask

att_masks_[:,:,:w_h,:w_h] = visual_token_masks.view(B, 1, w_h, w_h)

frank-xwang / InstanceDiffusion

Some questions about the codes relevant to instance mask. #18