Westlake-AI / MogaNet

[ICLR 2024] MogaNet: Efficient Multi-order Gated Aggregation Network
https://arxiv.org/abs/2211.03295
Apache License 2.0
162 stars 13 forks source link

Code Issue about MultiOrderGatedAggregation #5

Closed 123456789asdfjkl closed 1 year ago

123456789asdfjkl commented 1 year ago

https://github.com/Westlake-AI/MogaNet/blob/cd53ea044b15ff6639d86d04aee2298d0e6b8de7/models/moganet.py#L264-L333 image

Hi! Thank you for your great work! MultiOrderGatedAggregation模块的实现与论文不符,论文图中并没有shortcut,且FD的激活函数用的GELU。请问,我应该遵循哪个呢?

Lupin1998 commented 1 year ago

Hi, @123456789asdfjkl, thanks for your detailed question. I have checked the code and our paper, Fig. 4 has the shortcut (the same as the implement), while the activation function of the FD module in Fig. 4 should be SiLU as the code. Fig. 4 was out-of-data. Actually, using GELU or SiLU as the activation in the FD will yield similar performance. Please follow our code implementation when there is a conflict between the code and our arXiv preprint. We will update the arXiv revision to add more results and fix typos soon. Overall, thank you for using MogaNet and pointing out the typo.

123456789asdfjkl commented 1 year ago

您好,我觉得按图4的意思,shortcut应该对应的是下面的代码,您觉得呢

https://github.com/Westlake-AI/MogaNet/blob/cd53ea044b15ff6639d86d04aee2298d0e6b8de7/models/moganet.py#L403-405

Lupin1998 commented 1 year ago

Ummm, you are right. I overlooked the shortcut inside the MultiOrderGatedAggregation module. Another shortcut should be parallel to the FD module and the Multi-order Gated Aggregation module. Fig 4 will be corrected as follows. Thank you very much for your advice, and we will fix this typo in the arXiv revision.