Kyubyong / transformer

A TensorFlow Implementation of the Transformer: Attention Is All You Need
Apache License 2.0
4.25k stars 1.29k forks source link

Is value must be linear transformed at multi-head attention? #170

Open GuoYL36 opened 3 years ago

GuoYL36 commented 3 years ago

In the paper, Attention is All You Need, query, key, value are linear transformed at the multi-head attention. `

    Q = tf.layers.dense(queries, d_model, use_bias=True) # (N, T_q, d_model)

    K = tf.layers.dense(keys, d_model, use_bias=True) # (N, T_k, d_model)

    V = tf.layers.dense(values, d_model, use_bias=True) # (N, T_k, d_model)

` And I want to know whether value must be linear transformed at multi-head attention?