Closed AlwaysFHao closed 6 months ago
Residual connections are typically used in deeper architectures. For Mamba4Rec, the architecture is obtained through experiments and empirical validation. You can also use residual connections for single-layer architecture.
Residual connections are typically used in deeper architectures. For Mamba4Rec, the architecture is obtained through experiments and empirical validation. You can also use residual connections for single-layer architecture.
Okay, but the residual connections of the FFN layer in Mamba4Rec are not processed specifically for single layers. Can I ask why this is so?
I haven't tried an FFN without a residual connection. I directly use the design of FFN from Transformer.
I haven't tried an FFN without a residual connection. I directly use the design of FFN from Transformer.
Ok, thank you very match
Hello, I would like to ask why there is no need for residuals when the mamba block is in one layer. Is it because the number of layers is shallow and unnecessary, or is there any paper support? The TransformerEncoder of SASRec did not perform this processing.
if self.num_layers == 1: # one Mamba layer without residual connection hidden_states = self.LayerNorm(self.dropout(hidden_states)) else: # stacked Mamba layers with residual connections hidden_states = self.LayerNorm(self.dropout(hidden_states) + input_tensor) hidden_states = self.ffn(hidden_states)