Biqing-Qi / Exploring-Adversarial-Robustness-of-Deep-State-Space-Models

[NeurIPS 2024] Exploring Adversarial Robustness of Deep State Space Models
0 stars 0 forks source link

Use of residual in S6_SSM #2

Open TalRub104 opened 4 days ago

TalRub104 commented 4 days ago

Hi, In the S6_SSM forward function, you perform the following:

for layer in self.mapping_layers: residual = x x = layer(x, state) x = residual + x x = self.normsnum_layer num_layer += 1

In your paper, you did not specify that you add a residual to the Mamba layer output. Could you clarify why you chose to do that?

gjq100 commented 3 days ago

Thank you for your feedback. We referred to the example of s4 training on CIFAR10 (https://github.com/state-spaces/s4/blob/main/example.py) for the module connections, and residual connections are almost the most commonly used method for stacking modules currently. To align the structures of all models, we uniformly adopted residual connections (Mega already includes a gated residual connection in its basic block structure, so we did not add an additional residual connection for Mega).