ZhuiyiTechnology / GAU-alpha

基于Gated Attention Unit的Transformer模型(尝鲜版)
94 stars 9 forks source link

没看懂两层GAU是如何体现的 #3

Open qianlan001 opened 2 years ago

qianlan001 commented 2 years ago

请问苏老师说的将RoFormerV2的Attention+FFN换成了两层GAU,我好像只看到了一层GAU,是我哪里没有理解到位吗

ZhuiyiTechnology commented 1 year ago

GAU-alpha是一共24层GU,RoFormerV2是12层Attention+12层FFN