THU-MIG / RepViT

RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything
https://arxiv.org/abs/2307.09283
Apache License 2.0
681 stars 55 forks source link

关于用于提取图像特征编码器 #51

Open chaoying0115 opened 3 months ago

chaoying0115 commented 3 months ago

非常感谢团队的出色工作。论文中有提到将repvit用于depth antything 编码器得到指标提升。 我将repvit用于单目深度估计模型当中,把新的repvit骨干输出的图像shape,进行下采样、切片操作然后输入进去原来模型,与baseline(编码器为2022 cvpr mpvit)相比,指标仍有较大差距。 image

感觉是通道数设计的问题,即baseline的通道设计可能并不是最匹配repvit的,想请教一下repvit作为编码器时通道设计有什么需要注意的吗?或者有什么推荐阅读的材料和改进方向?

这是我对repvit输出的操作 image

这是baseline编码器解码器的通道数 image

非常期待得到您的回复,万分感谢!

jameslahm commented 3 months ago

Thanks for your interest. We thought that the padding and slice operations for channels may impair the performance. And we suggest that you could introduce extra projects layers, \eg, 1*1 convolution layers, to align the number of channels in the RepViT feature map with the number of channels you want, rather than directly padding or slicing channels.

chaoying0115 commented 3 months ago

ok thank you very much!