在moudules中的multi-head attention的实现

Kyubyong / transformer

A TensorFlow Implementation of the Transformer: Attention Is All You Need

Apache License 2.0

4.25k stars 1.29k forks source link

Open letmeheard opened 4 years ago

letmeheard commented 4 years ago

multi-head attention的实现中，是对一次线性映射后的Q,K,V分了h份，不应该是分别做h次不同的线性映射吗？或者这里输入的dmodel已经是乘以h了？还没仔细看model.py