why the residual connections in opennmt (transformer model), including opennmt-tf, are using un-normalized inputs,
such as:
`
input_norm = self.layer_norm(inputs)
context, _ = self.self_attn(input_norm, input_norm, input_norm, mask=mask, type="self")
out = self.dropout(context) + inputs # why not add input_norm like tensor2tensor
why the residual connections in opennmt (transformer model), including opennmt-tf, are using un-normalized inputs, such as: ` input_norm = self.layer_norm(inputs)
`