Open zzyy520 opened 3 months ago
Thanks for your interest. The benchmark results on our 2080ti device are below: | Model | Input | Throughput (bs=1024) |
---|---|---|---|
RepViT-M0.9 | 224 | 2870 | |
FastViT-T8 | 256 | 2379 (bs=768 because OOM when bs=1024) | |
MobileOne-S1 | 224 | 2745 |
May you provide more details about your benchmark results?
Thanks for your reply. The benchmark results on ours 2080ti GPU are below: Model Input Throughput(bs=512) MobileOne-s2 160 4152 MobileOne-s1 160 5523 RepViT-M1 160 5522 RepViT-M2 160 4708
(if bs=1)
MobileOne-s2 160 479
....-s1 160 429
RepViT-M1 160 200
RepViT-M2 160 182
FastVit-T8 160 325
Does this mean that the model is difficult to apply to the problem of single graph transmission single graph inference under the high-speed camera?
Thanks. We thought that it depends on the device. For example, RepViT-M0.9 runs as fast as MobileOne-S1 on iPhone 12 with bs=1. On the 2080Ti with bs=1, we suggest that you could locate some inference bottleneck. For example, SE layer with bs=1 may cause extra apparent latency on 2080Ti, which is not like on the iPhone. Besides, we suggest that you could improve the performance on 2080Ti with TensorRT. We will also try to improve the performance of RepViT in such case.
您好,我有几个问题和发现。基于2080ti GPU对Repvit的不同尺寸规模的模型进行速度测试,其并不能展现比mobileOne-s2,s1以及fastvit-t8更高的速度。无论是throughput还是FPS等都比相关的同精度算法模型要慢。(对比上述模型主要是因为均采用结构重参数)