@LukeForeverYoung @MAGAer13
First of all, thanks for your great work.
I have a question regarding the Feed Forward Network (FFN) of the Abstarctor and the forward method of MplugOwlVisualAbstractorAttention.
From a #10 issue, I knowed that the abstractor uses an FFN that applies Llama's SwinGLU.
However, in mPlugOwl, it uses LayerNorm instead of Llama's RMSNorm.
Is there a reason for this change? Is LayerNorm used instead of RMSNorm because the Abstarctor is a module for processing images?
Also, as far as I know, MplugOwlVisualAbstractorAttention is designed based on the Q-Former from BLIP-2.
# HACK we apply norm on q and k
hidden_states = self.norm1(hidden_states)
encoder_hidden_states = self.normk(encoder_hidden_states)
However, there is a piece of code in the forward method of MplugOwlVisualAbstractorAttention that does not exist in the Q-Former. Was there a problem in the implementation that required this addition?
@LukeForeverYoung @MAGAer13 First of all, thanks for your great work.
I have a question regarding the Feed Forward Network (FFN) of the Abstarctor and the forward method of MplugOwlVisualAbstractorAttention.
From a #10 issue, I knowed that the abstractor uses an FFN that applies Llama's SwinGLU. However, in mPlugOwl, it uses LayerNorm instead of Llama's RMSNorm. Is there a reason for this change? Is LayerNorm used instead of RMSNorm because the Abstarctor is a module for processing images?
Also, as far as I know, MplugOwlVisualAbstractorAttention is designed based on the Q-Former from BLIP-2.
However, there is a piece of code in the forward method of MplugOwlVisualAbstractorAttention that does not exist in the Q-Former. Was there a problem in the implementation that required this addition?