Open a982385200 opened 5 years ago
I also have encountered this problem too, hope to know why, wechat 13524052053
enc *= self.hp.d_model**0.5
means scale dot product:
@wqw547243068, @915288938lx , @a982385200 this code is following charactor embedding, not in multihead_attention. so, doing this is to adjust weight of charactor embedding and positional_embedding.
I want to know What does this code work? why embedding-vector * d_model**0.5? Thanks !