MzeroMiko / VMamba

VMamba: Visual State Space Models,code is based on mamba
MIT License
2.03k stars 120 forks source link

Throughput comparision with ViT CNN #72

Open MDD-0928 opened 6 months ago

MDD-0928 commented 6 months ago

Dear Authors,

Have you tested your model's throughput and compare it to ViTs & CNNs ?

Thanks!

MzeroMiko commented 6 months ago

The throughput of the original model is quite low, I have not tested the latest models yet.

MDD-0928 commented 6 months ago

I tested the latest model for inference using one NVIDIA 4090 GPU, compared to TransReID(a ReID model based on ViT-B), VMamba seems slower (I only add a Batchnorm1d after VMamba backbone) I thought maybe I was wrong when coding or something... I would like to consult that what do you think is the reason for this? ![Uploading 1.png…]()

MzeroMiko commented 6 months ago

@MDD-0928 I cannot see the png file in your comment, it is Uploading 1.png...

Our v4 version is faster than ever. while the throughput of swin with resolution 224x224 is 410 in our machine, forwardtype=v3 is only 258, but change to v4 without any cost, you can get 340 instead.

Note that in default swin setting, raising the resolution of image will lead to the window-size change, which is in quadratic complexity, and our model is purely in linear complexity instead. So, our model will be relatively faster with high resolutions.

In the meantime, we'll keep raising the speed in the future.