Closed yuanxiaosc closed 6 years ago
Code implementation of Scaled Dot-Product Attention
In your code, it is "dot_product = dot_product * (1.0 / tf.sqrt(tf.cast(self.d_model, tf.float32))) ".
I think that should be "dot_product = dot_product * (1.0 / tf.sqrt(tf.cast(self.d_model/h, tf.float32))) " according to the paper.
o.k. got it. thank you.
Code implementation of Scaled Dot-Product Attention
2. dot product of Q,K
In your code, it is "dot_product = dot_product * (1.0 / tf.sqrt(tf.cast(self.d_model, tf.float32))) ".
I think that should be "dot_product = dot_product * (1.0 / tf.sqrt(tf.cast(self.d_model/h, tf.float32))) " according to the paper.