Westlake-AI / MogaNet

[ICLR 2024] MogaNet: Efficient Multi-order Gated Aggregation Network
https://arxiv.org/abs/2211.03295
Apache License 2.0
162 stars 13 forks source link

cooldown epochs #13

Closed dhkim0225 closed 8 months ago

dhkim0225 commented 8 months ago

Thank you for your great work!

As far as I know, models such as DeiT and ConvNext do not use "cooldown_epochs". However, the code looks like MogaNet was trained in 310 epochs rather than 300 epochs. Are the accuracies in the paper posted on openreview all learned from 310 epochs?

Lupin1998 commented 8 months ago

Hi, @dhkim0225, thanks for your question! The "cooldown_epochs" is not the necessary setup for training MogaNet, and we also provided 300-epoch implementations and results in OpenMixup. Actually, the "cooldown_epochs" implemented in Timm is the default training setup as the image classification implementation was migrated from PoolFormer, which has little effect on the final performance. It might be useful to some Transformer architectures, e.g., Uniformer. To my knowledge, whether to use 300 or 310 epochs training has little to do with whether to post manuscripts on visual architectures on OpenReview,