Question about `gating mechanism`

MzeroMiko / VMamba

VMamba: Visual State Space Models，code is based on mamba

MIT License

1.83k stars 100 forks source link

Question about `gating mechanism` #185

Open Journey7331 opened 1 month ago

Journey7331 commented 1 month ago

Hi, @MzeroMiko, I see this in your paper, and I see noz in your model yaml.

How to explain the gating mechanism has already been implemented by the selectivity of SS2D? Is there any code lines could explain this?

MzeroMiko commented 1 month ago

We say that S6 have Gating mechanism inside as we can rewrite the SSM into attention-like form $$(((Q \odot W )(\frac{K}{W})^T ) \odot M)V$$ So $W$ is like the Gate added into a "linear transform", together make this special attention SSM. For details, you can refer to the visualization section of the paper.

Journey7331 commented 1 month ago

Thanks for your quick reply :)

But are there any other ablation experiments or attention map comparisons that can prove this? I think it may be easier to get this point. 👀 If there are any such studies, that would be greatly appreciated. ❤️