THU-MIG / RepViT

RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything
https://arxiv.org/abs/2307.09283
Apache License 2.0
681 stars 55 forks source link

speed gpu #50

Open zzyy520 opened 3 months ago

zzyy520 commented 3 months ago

您好,我有几个问题和发现。基于2080ti GPU对Repvit的不同尺寸规模的模型进行速度测试,其并不能展现比mobileOne-s2,s1以及fastvit-t8更高的速度。无论是throughput还是FPS等都比相关的同精度算法模型要慢。(对比上述模型主要是因为均采用结构重参数)

jameslahm commented 3 months ago
Thanks for your interest. The benchmark results on our 2080ti device are below: Model Input Throughput (bs=1024)
RepViT-M0.9 224 2870
FastViT-T8 256 2379 (bs=768 because OOM when bs=1024)
MobileOne-S1 224 2745

May you provide more details about your benchmark results?

zzyy520 commented 3 months ago

Thanks for your reply. The benchmark results on ours 2080ti GPU are below: Model Input Throughput(bs=512) MobileOne-s2 160 4152 MobileOne-s1 160 5523 RepViT-M1 160 5522 RepViT-M2 160 4708

(if bs=1)

MobileOne-s2 160 479 ....-s1 160 429 RepViT-M1 160 200 RepViT-M2 160 182
FastVit-T8 160 325

Does this mean that the model is difficult to apply to the problem of single graph transmission single graph inference under the high-speed camera?

jameslahm commented 3 months ago

Thanks. We thought that it depends on the device. For example, RepViT-M0.9 runs as fast as MobileOne-S1 on iPhone 12 with bs=1. On the 2080Ti with bs=1, we suggest that you could locate some inference bottleneck. For example, SE layer with bs=1 may cause extra apparent latency on 2080Ti, which is not like on the iPhone. Besides, we suggest that you could improve the performance on 2080Ti with TensorRT. We will also try to improve the performance of RepViT in such case.