NVlabs / MambaVision

Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
https://arxiv.org/abs/2407.08083
Other
704 stars 40 forks source link

Question on SSM Model Removal and Performance Comparison #10

Closed Ryoo72 closed 2 months ago

Ryoo72 commented 2 months ago

I have read your excellent research with great interest. Thank you very much. I have one question. According to the "mamba out" paper, performance improves when the SSM model is removed. Comparing the two tables below, it appears that the "mamba out" paper shows better performance without the SSM (see when the parameter count is around 50M). I am curious if there are any experimental results available where the SSM module was removed.

스크린샷 2024-07-15 오전 11 33 54 스크린샷 2024-07-15 오전 11 34 02
ahatamiz commented 2 months ago

Thanks @Ryoo72 for the note. We did not really focus on optimizing performance based on the number of parameters. Instead, our main focus is throughput vs accuracy. In this case, our models achieve SOTA performance for this specific trade-off for different variants. Also worth to mention that competing for number of parameters (or FLOPs) is not as difficult. The number of parameters can be artificially reduced by using depth-wise conv layers instead of dense counterparts in stage 1 and 2, while negatively impacting GPU utilization and throughput.

Regarding the importance of SSM, our specific formulation allows for higher throughput and lower memory utilization -- although we did not present formal numbers for the latter, it can be quite easily verified. In addition, removing the SSM part for different models (in this case replacement with self-attention) did not result in improvement in terms of accuracy, yet significantly reduced the throughput.

We believe that it could be due to our specific formulation in which two paths, with and without SSM, are used in the same module to learn features of rich representations -- the SSM part with selective scan and non-SSM path are complementary in this case.

Ryoo72 commented 2 months ago

thx for reply!