Closed liu-wei-song closed 2 months ago
I would also like to ask why the values are not stable during Bidi-Scan?
The scripts for throughput and train throughput are in 'analyze/tp.log'.
The train throughput includes model forward, loss forward and backward, model backward. We did not include the time costed by optimizer.
The low training throughput of vision models with SSM may lies on the less efficient parallelism compared with plain matrix multiplication which is widely used by linear, convolution and attention.
But with resolution rises, this situation changes due to its linear complexity compared to the attention mechanism, which is in quadratic complexity.
I would also like to ask why the values are not stable during Bidi-Scan?
Sorry, I do not know for now.
Thank you very much.
Thank you very much.
Hello, thank you very much for your excellent work. However, I would like to ask about how to analyze the training Throughput. Why does mamba appear to be slower than other work?