codertimo / BERT-pytorch

Google AI 2018 BERT pytorch implementation
Apache License 2.0
6.09k stars 1.29k forks source link

transformer.py 中的forword方法调用的SublayerConnection类。实现残差链接和标准化的实现 #83

Open dshwei opened 3 years ago

dshwei commented 3 years ago

sublayerout = layerNorm(x +sublayer(x)) 首先是残差链接然后是层标准化 在你代码中:sublayer.py中 应该是 def forward(self, x, sublayer): "Apply residual connection to any sublayer with the same size."

return x + self.dropout(sublayer(self.norm(x)))

    return self.norm( x + self.dropout(sublayer(x)))

tranformer.py中: def forward(self, x, mask): x = self.input_sublayer(x, lambda _x: self.attention.forward(_x, _x, _x, mask=mask)) x = self.output_sublayer(x, lambda _x: self.feed_forward.forward(_x)) return self.dropout(x)

此处我对论文立即额和你不一样,有错误的地方请指教

Bowen-n commented 3 years ago

The transformer implementation is the same as The Annotated Transformer.

In sublayer.py, there is a comment Note for code simplicity the norm is first as opposed to last.