why the residual connection in Opennmt are using un-normalized inputs

OpenNMT / OpenNMT-py

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

https://opennmt.net/

MIT License

6.76k stars 2.25k forks source link

why the residual connection in Opennmt are using un-normalized inputs #1375

Closed liyc7711 closed 5 years ago

liyc7711 commented 5 years ago

why the residual connections in opennmt (transformer model), including opennmt-tf, are using un-normalized inputs, such as: ` input_norm = self.layer_norm(inputs)

    context, _ = self.self_attn(input_norm, input_norm, input_norm, mask=mask, type="self")

    out = self.dropout(context) + inputs  # why not add input_norm like tensor2tensor

guillaumekln commented 5 years ago

See https://github.com/OpenNMT/OpenNMT-py/issues/770