Westlake-AI / MogaNet

[ICLR 2024] MogaNet: Efficient Multi-order Gated Aggregation Network
https://arxiv.org/abs/2211.03295
Apache License 2.0
155 stars 12 forks source link

What do the two Subtract operations mean? #19

Open CacatuaAlan opened 2 months ago

CacatuaAlan commented 2 months ago

Hi! Mentioned that your paper has two Subtract operations which confuses me. Can I just consider them as decouple?

Lupin1998 commented 3 weeks ago

Hi, @CacatuaAlan, sorry for the late reply! You can regard the two subtract operations in the feature decomposition (FD) and the channel aggregation (CA) modules as removing the low-pass components. Then, we adaptively combine the low-pass and the rest components to increase the diversity (i.e., enhancing the modeling of middle-order interactions). Feel free to ask me when new problems occur and star our repo if it's helpful to your project!

CacatuaAlan commented 3 weeks ago

Thank you for your patient explanation. In the two subtraction operations within the FD and CA modules, the subtracted components are different operations (GAP and conv1x1). Is there any special design or experimental comparison for this?

Hi, @CacatuaAlan, sorry for the late reply! You can regard the two subtract operations in the feature decomposition (FD) and the channel aggregation (CA) modules as removing the low-pass components. Then, we adaptively combine the low-pass and the rest components to increase the diversity (i.e., enhancing the modeling of middle-order interactions). Feel free to ask me when new problems occur and star our repo if it's helpful to your project!

Thank you for your patient explanation. In the two subtraction operations within the FD and CA modules, the subtracted components are different operations (GAP and conv1x1). Is there any special design or experimental comparison for this?