Attention mask in last Swin basic layer

facebookresearch / MaskFormer

Per-Pixel Classification is Not All You Need for Semantic Segmentation (NeurIPS 2021, spotlight)

Other

1.34k stars 150 forks source link

Closed shkarupa-alex closed 2 years ago

shkarupa-alex commented 2 years ago

In the original Swin implementation last BasicLayer with 2 SwinTransformerBlock's does not uses attention mask:

But your SwinTransformerBlock implementation does not uses such condition and first SwinTransformerBlock will be computed WITH attention mask.

Is it an error, or you did this on purpose? Will it harm the performance or boost it?

bowenc0221 commented 2 years ago

The Swin Transformer backbone is from official implementation for semantic segmentation without any modification:

I would suggest redirect your question to the original Swin Transformer authors.