Kyubyong / transformer

A TensorFlow Implementation of the Transformer: Attention Is All You Need
Apache License 2.0
4.25k stars 1.29k forks source link

在moudules中的multi-head attention的实现 #139

Open letmeheard opened 4 years ago

letmeheard commented 4 years ago

multi-head attention的实现中,是对一次线性映射后的Q,K,V分了h份,不应该是分别做h次不同的线性映射吗? 或者这里输入的dmodel已经是乘以h了?还没仔细看model.py