Thanks for sharing the codes for this great work! I have a question on the input the mask_decoder in the code:
According to the paper and comments in the code, the input to the mask_decoder should be the embedding and the visible mask. However, here shows the input to the mask_decoder is the embedding and the 5th channel of obj_patches_foward, which seems to be the optical flow x, not the visible mask (from the dataset here). Indexing the visible mask might be obj_patches_foward[..., 3], since obj_patches_foward[..., [0, 1, 2]] is rgb. I am wondering would this make a difference? Or did I miss anything?
I attach a few plots for different channels of obj_patches_foward
obj_patches_foward[0, 0, ..., 3]:
obj_patches_foward[0, 0, ..., 4]:
Thank you in advance and looking forward to your reply!
Good catch! Actually, we had observed the issue before and found that the results from applying a visible mask and optical flow x are similar, based on our experiments.
Dear authors,
Thanks for sharing the codes for this great work! I have a question on the input the mask_decoder in the code:
According to the paper and comments in the code, the input to the mask_decoder should be the embedding and the visible mask. However, here shows the input to the mask_decoder is the embedding and the 5th channel of obj_patches_foward, which seems to be the optical flow x, not the visible mask (from the dataset here). Indexing the visible mask might be obj_patches_foward[..., 3], since obj_patches_foward[..., [0, 1, 2]] is rgb. I am wondering would this make a difference? Or did I miss anything?
I attach a few plots for different channels of obj_patches_foward obj_patches_foward[0, 0, ..., 3]:
obj_patches_foward[0, 0, ..., 4]:
Thank you in advance and looking forward to your reply!