LeapLabTHU / Agent-Attention

Official repository of Agent Attention (ECCV2024)
473 stars 35 forks source link

How to decide the window size in agent_swin #27

Closed Doraemonzm closed 6 months ago

Doraemonzm commented 6 months ago

Thank you for your excellent work! I find that the original window size in Swin-T is 7, whereas in agent_swin, it is 56. I am curious about your design choices regarding the window size and stage attention types in agent-swin-T/S/B. Are there any guiding principles behind these decisions?

tian-qing001 commented 6 months ago

Hi @Doraemonzm, thank you for your attention to our work.

As outlined in sec. 5.5 of our paper, the linear complexity of our module ensures that the Agent-Swin model maintains consistent computation complexity with an increased window size. Therefore, we use a window size of 56 to benefit from a global receptive field. In contrast, for Swin-T, enlarging the window size from 7 to 56 would elevate FLOPs from 4.5G to 8.8G.

Our design principle for stage attention types is using our agent attention in earlier stages, where feature map resolutions are higher, to fully leverage the benefits of enlarged receptive field.

Doraemonzm commented 6 months ago

Thanks for your quick reply. Could you please share the code for calculating FLOPs?

tian-qing001 commented 6 months ago

We use fvcore to calculate FLOPs.