Input to the self.mask_decoder

Dear authors,

Thanks for sharing the codes for this great work! I have a question on the input the mask_decoder in the code:

According to the paper and comments in the code, the input to the mask_decoder should be the embedding and the visible mask. However, here shows the input to the mask_decoder is the embedding and the 5th channel of obj_patches_foward, which seems to be the optical flow x, not the visible mask (from the dataset here). Indexing the visible mask might be obj_patches_foward[..., 3], since obj_patches_foward[..., [0, 1, 2]] is rgb. I am wondering would this make a difference? Or did I miss anything?

I attach a few plots for different channels of obj_patches_foward obj_patches_foward[0, 0, ..., 3]:

obj_patches_foward[0, 0, ..., 4]:

Thank you in advance and looking forward to your reply!

amazon-science / self-supervised-amodal-video-object-segmentation

Input to the self.mask_decoder #9