MzeroMiko / VMamba

VMamba: Visual State Space Models,code is based on mamba
MIT License
1.83k stars 100 forks source link

Train Throughput #157

Closed liu-wei-song closed 2 months ago

liu-wei-song commented 2 months ago

Hello, thank you very much for your excellent work. However, I would like to ask about how to analyze the training Throughput. Why does mamba appear to be slower than other work?

liu-wei-song commented 2 months ago

I would also like to ask why the values are not stable during Bidi-Scan?

MzeroMiko commented 2 months ago

The scripts for throughput and train throughput are in 'analyze/tp.log'.

The train throughput includes model forward, loss forward and backward, model backward. We did not include the time costed by optimizer.

The low training throughput of vision models with SSM may lies on the less efficient parallelism compared with plain matrix multiplication which is widely used by linear, convolution and attention.

But with resolution rises, this situation changes due to its linear complexity compared to the attention mechanism, which is in quadratic complexity.

MzeroMiko commented 2 months ago

I would also like to ask why the values are not stable during Bidi-Scan?

Sorry, I do not know for now.

liu-wei-song commented 2 months ago

Thank you very much.

liu-wei-song commented 2 months ago

Thank you very much.