I would like to know if the mamba block in your work includes skip connection like vmamba, because I found in the code that the residual tensor seems to serve this. It seems that the figure of mamba block in your paper does not include it, was it my mistake?
Yes, the residual is used undoubtedly. As for the Fig.2, I have claimed in the caption: We omit the initial normalization and the final residual for simplification.
I would like to know if the mamba block in your work includes skip connection like vmamba, because I found in the code that the residual tensor seems to serve this. It seems that the figure of mamba block in your paper does not include it, was it my mistake?