Closed Amshaker closed 1 year ago
Hi, thanks for your question. The part which is not consistent is actually the downsampling layers. We don't use SE in the downsampling layers. And in the blocks of each stage, the pattern is consistent to use SE or not. The implementation in timm https://github.com/huggingface/pytorch-image-models/blob/81089b10a24b3a780988a1cdf075dd1de9c17042/timm/models/repvit.py#L222 maybe more clear. Besides, we don't use any Neural architecture search.
Hi,
Thank you for sharing your work!
I have a question please, based on what do you choose to use SE in each RepViTBlock or not "green column"? The pattern is not consistent. Are you using any Neural architecture search?