while the code "enc *= self.hp.d_model**0.5 # scale" in model.py

Kyubyong / transformer

A TensorFlow Implementation of the Transformer: Attention Is All You Need

Apache License 2.0

4.28k stars 1.3k forks source link

while the code "enc *= self.hp.d_model**0.5 # scale" in model.py #130

Open a982385200 opened 5 years ago

a982385200 commented 5 years ago

I want to know What does this code work? why embedding-vector * d_model**0.5? Thanks !

915288938lx commented 5 years ago

I also have encountered this problem too, hope to know why, wechat 13524052053

wqw547243068 commented 4 years ago

enc *= self.hp.d_model**0.5

means scale dot product:

$enc=enc*\sqrt{self.hp.dmodel}$

rejae commented 4 years ago

@wqw547243068, @915288938lx , @a982385200 this code is following charactor embedding, not in multihead_attention. so, doing this is to adjust weight of charactor embedding and positional_embedding.