Sense-X / UniFormer

[ICLR2022] official implementation of UniFormer
Apache License 2.0
812 stars 111 forks source link

One 7x7 conv vs. two 3x3 conv #111

Closed LMMMEng closed 1 year ago

LMMMEng commented 1 year ago

Thank you for your wonderful work!

Is two 3x3 convs (stride=2) substituted for one 7x7 conv (stride=4) as stem because the former leads to better results?

Andy1621 commented 1 year ago

Yes. Double 3x3 convs not only save computation, but also achieve a little better results.

LMMMEng commented 1 year ago

Thank you! Do you remember exactly how much improvement there was on ImageNet?

Andy1621 commented 1 year ago

Sorry, I'm not sure. But double 3x3 conv is a popular modification in current vision transformers. You can simply adopt the better setting.

LMMMEng commented 1 year ago

Got it, thank you!