JasonStraka / HUTformer

An implementation of HUTformer.
3 stars 0 forks source link

Decoder的qkv问题 #1

Open jexterliangsufe opened 9 months ago

jexterliangsufe commented 9 months ago

按照论文中CA的公式,q应该是H_dec,k和v应该是H'_enc,不知道我理解的对不对

image[1] image[2]

但是可以看到分母d_dec对应着d_k,我理解是因为d_dec=d_enc,所以作者随便写了一个上去?

[1]HUTFormer: Hierarchical U-Net Transformer for Long-Term Traffic Forecasting [2]Attention Is All You Need

JasonStraka commented 7 months ago

谢谢提醒,我在复现代码中解码器输入H_dec和H'_enc顺序取反了;我认为H_dec是[BN, num_patch*2, embed_dim/2], H'_enc是[BN, num_patch, embed_dim/2], H_enc是[BN, num_patch, embed_dim],(论文中提到The Linear(·) layer is used to transform the hidden dimension from d_enc to d_dec),因此d_dec才是最终的dk,d_enc是d_dec的两倍。