sublayerout = layerNorm(x +sublayer(x)) 首先是残差链接然后是层标准化 在你代码中：sublayer.py中应该是 def forward(self, x, sublayer): "Apply residual connection to any sublayer with the same size."

return x + self.dropout(sublayer(self.norm(x)))

    return self.norm( x + self.dropout(sublayer(x)))

tranformer.py中： def forward(self, x, mask): x = self.input_sublayer(x, lambda _x: self.attention.forward(_x, _x, _x, mask=mask)) x = self.output_sublayer(x, lambda _x: self.feed_forward.forward(_x)) return self.dropout(x)

此处我对论文立即额和你不一样，有错误的地方请指教

codertimo / BERT-pytorch

transformer.py 中的forword方法调用的SublayerConnection类。实现残差链接和标准化的实现 #83

return x + self.dropout(sublayer(self.norm(x)))