Why not perform layer fusion in the downstream models?

THU-MIG / RepViT

RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything

https://arxiv.org/abs/2307.09283

Apache License 2.0

738 stars 55 forks source link

Why not perform layer fusion in the downstream models? #73

Open hyz-xmaster opened 3 weeks ago

hyz-xmaster commented 3 weeks ago

I observed that you don't utilize this function to do structural re-parameterization for the RepViT-backbone in downstream tasks, such as detection and RepViT-SAM, even though it could considerably improve the inference speed of these models.

I would appreciate it if you could share your thoughts on this choice.