Open aifeixingdelv opened 3 weeks ago
I do an inference speed experiment about above two kinds of models. Models using VSSencoder has the similiar params num but lager GFLOPs than the model using Transfomer encoder variants. However,their inference speed or hz is very very near. I am curious, shouldn't smaller GFLOPs actually result in faster inference?
Compared to linear transformers,mamba based model may have better interpretability and better performance. But with no advantage in flops or inference speed.
The inference speed is also related to the way it implements. For example,the flops of mamba could be smaller if we use a vanilla for loop to implement the state transfer,but the author of mamba finally chose to double the flops of that procedure to implement it in a more parallel and more efficient manner.
Thanks for your reply!!
Compared to linear transformers,mamba based model may have better interpretability and better performance. But with no advantage in flops or inference speed.
The inference speed is also related to the way it implements. For example,the flops of mamba could be smaller if we use a vanilla for loop to implement the state transfer,but the author of mamba finally chose to double the flops of that procedure to implement it in a more parallel and more efficient manner.
Do you feel vmamba will occupy more gpu memory than resnet or transformer when training?
Thanks for your nice contribution!! When I try to replace the Transformer block in a model with VSSEncoder(The Transformer includes factorized self-attention for its linear complexity as done in CoaT,a paper titled “Co-Scale Conv-Attentional Image Transformers”,), I find if the params is similar,models with VSSencoder has higher FLOPs,such as Params: 5.843736M(vit) vs 5.910544M(vmamba) FLOPs:1.654163812G(vit)vs 2.754769032G(vmamba) So,I want to know the advantages of Vmamba over other transformer models with near linear complexity? If, the Vmamba can get faster inference speed than other transformer models with near linear complexity?