LeeSureman / Flat-Lattice-Transformer

code for ACL 2020 paper: FLAT: Chinese NER Using Flat-Lattice Transformer
1k stars 178 forks source link

transformer的残差add 貌似错误 #111

Open iamqiz opened 2 years ago

iamqiz commented 2 years ago

多头注意力和ff层之后, 残差add 是将这层的输出和这层的输入相加, 但是代码里 add 是输出+输出 ,而不是输出+输入 见下面1132和1135行 https://github.com/LeeSureman/Flat-Lattice-Transformer/blob/200af2cf64cd4cd6dd0e357bbd48609203abdfd8/V1/modules.py#L1119-L1137

默认的后处理配置是"an",如下 https://github.com/LeeSureman/Flat-Lattice-Transformer/blob/200af2cf64cd4cd6dd0e357bbd48609203abdfd8/V1/flat_main.py#L144

后处理函数,如下行 https://github.com/LeeSureman/Flat-Lattice-Transformer/blob/200af2cf64cd4cd6dd0e357bbd48609203abdfd8/V1/modules.py#L1151-L1161

zhangliang-chn commented 8 months ago

请问楼主修改后效果有提升吗?我修改add&norm后效果还更差了,不懂为啥