Westlake-AI / MogaNet

[ICLR 2024] MogaNet: Efficient Multi-order Gated Aggregation Network
https://arxiv.org/abs/2211.03295
Apache License 2.0
162 stars 13 forks source link

depths #20

Closed liqiangde closed 2 weeks ago

liqiangde commented 2 months ago

请问为什么xtiny的depths是(3,3,12,2)但是small版本的depths是(2,3,12,2),small版本第一阶段depths为2是一个特殊的设置吗?

Lupin1998 commented 2 months ago

Hi, @liqiangde. Thanks for your question and attention! That's right, we specially tuned the depth configuration for MogaNet-Small based on performances and parameters. Generally, the small-size model (around 25M parameters) usually adopts depths like (3, 3, 6, 3), (3, 3, 9, 3), or (3, 3, 10, 3) with different embedding dimensions. Using more blocks in stage 1 requires more FLOPs, while more blocks in stage 4 cost more parameters. Thus, we reduce the block numbers in stage-1 and stage-4 for a better trade-off of performances and FLOPs & parameters. Feel free to ask me if there are more questions, and star our repo if it's helpful to your project!

Lupin1998 commented 2 weeks ago

If you have no more questions, I will close this issue. Feel free to open a new issue if more question occurs.