LeapLabTHU / FLatten-Transformer

Official repository of FLatten Transformer (ICCV2023)
401 stars 24 forks source link

window size swin #29

Closed magehrig closed 1 week ago

magehrig commented 1 week ago

Thanks for open sourcing your work.

Unless I am not mistaken, you have a swin window size of 14x14 on stage (res3 with 14x14 resolution) because you set the window size to 56 here and just skip windowing if the window size is larger or equal the resolution here. Hence, you don't actually have any window attention in Swin. This is not consistent with table 14 in your appendix of your paper. Just wanted to let you know, not sure if that changes anything.

tian-qing001 commented 1 week ago

Hi @magehrig, thanks for your attention to our work.

The swin window size is 7x7 in stage 3, as the window size set in the config only controls the window size for linear attention blocks. In stages 3 and 4, the window size for the original shifted window attention is kept as 7x7 for the 224x224 input. This is achieved by this line and the config.

magehrig commented 1 week ago

I see. Thanks!