THU-MIG / RepViT

RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything
https://arxiv.org/abs/2307.09283
Apache License 2.0
730 stars 55 forks source link

SE selection RepViTBlock #9

Closed Amshaker closed 1 year ago

Amshaker commented 1 year ago

Hi,

Thank you for sharing your work!

I have a question please, based on what do you choose to use SE in each RepViTBlock or not "green column"? The pattern is not consistent. Are you using any Neural architecture search?

image
jameslahm commented 1 year ago

Hi, thanks for your question. The part which is not consistent is actually the downsampling layers. We don't use SE in the downsampling layers. And in the blocks of each stage, the pattern is consistent to use SE or not. The implementation in timm https://github.com/huggingface/pytorch-image-models/blob/81089b10a24b3a780988a1cdf075dd1de9c17042/timm/models/repvit.py#L222 maybe more clear. Besides, we don't use any Neural architecture search.