Closed magehrig closed 1 week ago
Hi @magehrig, thanks for your attention to our work.
The swin window size is 7x7 in stage 3, as the window size set in the config only controls the window size for linear attention blocks. In stages 3 and 4, the window size for the original shifted window attention is kept as 7x7 for the 224x224 input. This is achieved by this line and the config.
I see. Thanks!
Thanks for open sourcing your work.
Unless I am not mistaken, you have a swin window size of 14x14 on stage (res3 with 14x14 resolution) because you set the window size to 56 here and just skip windowing if the window size is larger or equal the resolution here. Hence, you don't actually have any window attention in Swin. This is not consistent with table 14 in your appendix of your paper. Just wanted to let you know, not sure if that changes anything.