MzeroMiko / VMamba

VMamba: Visual State Space Models,code is based on mamba
MIT License
2.01k stars 114 forks source link

Effects of SSM_D_STATE and SSM_RATIO #74

Open DianCh opened 6 months ago

DianCh commented 6 months ago

Hi, I noticed that the model consumes a lot of memory, and wondering if you managed to reduce it by changing SSM_D_STATE or SSM_RATIO while maintain/improve the performance (they were changed in configs/vssm1/vssm_tiny_224_0229.yaml compared to original configs/vssm/vssm_tiny_224.yaml) - do they affect the performance a lot?

MzeroMiko commented 6 months ago

The performance of vssm_tiny_224_0229 and 0230 are both ~ 82.4,actually I do not think it'll make any difference if D_STATE=1 or 16, when D_STATE is small, but this needs to be proved. SSM_RATIO affects performance a lot, that is for sure.

HashmatShadab commented 5 months ago

The changes in v2 with respect v0 models seem to be setting the D_STATE from 16 to 1 while adding the mlp branch (MLP_RATIO=4) in VSSBlock . Further within the SS2D block the in_proj layer output dimension is d_inner now instead of d_inner*2(no skip connection inside the SS2D block).

Is this correct?

MzeroMiko commented 5 months ago

Yes, you are correct basically.

HashmatShadab commented 5 months ago

Thanks! Can you please elaborate on what where the reasons for setting D_STATE from 16 to 1 and if you have any insights on the effect of varying D_STATE.