Open dshwei opened 3 years ago
sublayerout = layerNorm(x +sublayer(x)) 首先是残差链接然后是层标准化 在你代码中:sublayer.py中 应该是 def forward(self, x, sublayer): "Apply residual connection to any sublayer with the same size."
return self.norm( x + self.dropout(sublayer(x)))
tranformer.py中: def forward(self, x, mask): x = self.input_sublayer(x, lambda _x: self.attention.forward(_x, _x, _x, mask=mask)) x = self.output_sublayer(x, lambda _x: self.feed_forward.forward(_x)) return self.dropout(x)
此处我对论文立即额和你不一样,有错误的地方请指教
The transformer implementation is the same as The Annotated Transformer.
In sublayer.py, there is a comment Note for code simplicity the norm is first as opposed to last.
Note for code simplicity the norm is first as opposed to last.
sublayerout = layerNorm(x +sublayer(x)) 首先是残差链接然后是层标准化 在你代码中:sublayer.py中 应该是 def forward(self, x, sublayer): "Apply residual connection to any sublayer with the same size."
return x + self.dropout(sublayer(self.norm(x)))
tranformer.py中: def forward(self, x, mask): x = self.input_sublayer(x, lambda _x: self.attention.forward(_x, _x, _x, mask=mask)) x = self.output_sublayer(x, lambda _x: self.feed_forward.forward(_x)) return self.dropout(x)
此处我对论文立即额和你不一样,有错误的地方请指教