Open DianCh opened 6 months ago
The performance of vssm_tiny_224_0229
and 0230
are both ~ 82.4,actually I do not think it'll make any difference if D_STATE=1 or 16, when D_STATE is small, but this needs to be proved.
SSM_RATIO affects performance a lot, that is for sure.
The changes in v2 with respect v0 models seem to be setting the D_STATE from 16 to 1 while adding the mlp branch (MLP_RATIO=4) in VSSBlock . Further within the SS2D block the in_proj layer output dimension is d_inner now instead of d_inner*2(no skip connection inside the SS2D block).
Is this correct?
Yes, you are correct basically.
Thanks! Can you please elaborate on what where the reasons for setting D_STATE from 16 to 1 and if you have any insights on the effect of varying D_STATE.
Hi, I noticed that the model consumes a lot of memory, and wondering if you managed to reduce it by changing
SSM_D_STATE
orSSM_RATIO
while maintain/improve the performance (they were changed inconfigs/vssm1/vssm_tiny_224_0229.yaml
compared to originalconfigs/vssm/vssm_tiny_224.yaml
) - do they affect the performance a lot?