facebookresearch / Mask2Former

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
MIT License
2.59k stars 388 forks source link

question about the second formula of the article #178

Open Ceasar9999 opened 1 year ago

Ceasar9999 commented 1 year ago

I found a mistake. Specifically, the second eqution showing in your paper is different with your code. The eqution 2 of your paper shows the Mask added with QK, but in your code, I found you use the function 'masked_fill' to achive multiplication of mask and QK. Please give me some explaination.

TemugeB commented 1 year ago

i have the same question. A mask could be all -inf because everything was below the threshold. After softmax, these would return nan tensors, which means no back propagation. How to mask properly in this case?