chengkai-liu / Mamba4Rec

[RelKD'24] Mamba4Rec: Towards Efficient Sequential Recommendation with Selective State Space Models
https://arxiv.org/abs/2403.03900
MIT License
74 stars 2 forks source link

About residual #9

Closed AlwaysFHao closed 3 months ago

AlwaysFHao commented 3 months ago

Hello, I would like to ask why there is no need for residuals when the mamba block is in one layer. Is it because the number of layers is shallow and unnecessary, or is there any paper support? The TransformerEncoder of SASRec did not perform this processing.

if self.num_layers == 1: # one Mamba layer without residual connection hidden_states = self.LayerNorm(self.dropout(hidden_states)) else: # stacked Mamba layers with residual connections hidden_states = self.LayerNorm(self.dropout(hidden_states) + input_tensor) hidden_states = self.ffn(hidden_states)

chengkai-liu commented 3 months ago

Residual connections are typically used in deeper architectures. For Mamba4Rec, the architecture is obtained through experiments and empirical validation. You can also use residual connections for single-layer architecture.

AlwaysFHao commented 3 months ago

Residual connections are typically used in deeper architectures. For Mamba4Rec, the architecture is obtained through experiments and empirical validation. You can also use residual connections for single-layer architecture.

Okay, but the residual connections of the FFN layer in Mamba4Rec are not processed specifically for single layers. Can I ask why this is so?

chengkai-liu commented 3 months ago

I haven't tried an FFN without a residual connection. I directly use the design of FFN from Transformer.

AlwaysFHao commented 3 months ago

I haven't tried an FFN without a residual connection. I directly use the design of FFN from Transformer.

Ok, thank you very match