关于residual结构的疑问。在resume数据集上只达到95.15%，还没有达到95.5%。

wn1652400018 commented 3 years ago

1.我总感觉作者transformer_encoder时的residual结构写的有问题，但是我把它修改后发现效果变差了。不知道为什么。 class Layer_Process(nn.Module): def init(self, process_sequence, hidden_size, dropout=0, use_pytorch_dropout=True): def forward(self, inp): output = inp for op in self.process_sequence: #process_sequence=’an‘ if op == 'a': output = output + inp #这里不是相当于将inp*2吗？ elif op == 'd': output = self.dropout(output) elif op == 'n': output = self.layer_norm(output) return output 2.修改了个别超参，如batch设置为5，k_proj修改为True，作者设置为false，另外在融合位置embed的时候ss，se，es，ee都使用了，作者的超参只使用了ss，ee，为了增大每个注意力头的大小稍微怎大了隐藏层大小。其中将k_proj修改为True就可以到达95.0%，使用4个相对位置融合感觉没有提升，增大隐藏层大小上升到95.15%。笔记本空间有限就没有继续增加隐藏层。

wn1652400018 commented 3 years ago

@LeeSureman 我看见其他issue里也有人说residual的问题，作者说修改后有提升。可是我修改后却下降了。不知道是不是写错了。不知道作者能不能跟新下代码。

LeeSureman commented 3 years ago

这个数据集比较小，可能需要多跑几遍。我是在dev集上选择最优模型的

wn1652400018 commented 3 years ago

作者你好，我看见你在其他的issue里提到更正了residual结构后，效果有提升。可是我试了好多次效果都是下降的，不知道什么原因。不知道作者能不能跟新下这部分的代码。或许是我把这部分代码写错了。。。感谢大佬。

------------------ 原始邮件 ------------------ 发件人: "李孝男"<notifications@github.com>; 发送时间: 2020年12月4日(星期五) 上午9:04 收件人: "LeeSureman/Flat-Lattice-Transformer"<Flat-Lattice-Transformer@noreply.github.com>; 抄送: "微冷/不觉寒"<1652400018@qq.com>; "Author"<author@noreply.github.com>; 主题: Re: [LeeSureman/Flat-Lattice-Transformer] 我发现作者是通过在测试集上的结果选取最优模型，想问下，这么做合理吗？目前按照作者选择最优结果的方法，在resume数据集上只达到95.15%，没有达到95.5%。 (#50)

这个数据集比较小，可能需要多跑几遍。我是在dev集上选择最优模型的

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

wn1652400018 commented 3 years ago

对不起，我这里看错了。确实是按照dev来选择最优模型的。

------------------ 原始邮件 ------------------ 发件人: "李孝男"<notifications@github.com>; 发送时间: 2020年12月4日(星期五) 上午9:04 收件人: "LeeSureman/Flat-Lattice-Transformer"<Flat-Lattice-Transformer@noreply.github.com>; 抄送: "微冷/不觉寒"<1652400018@qq.com>; "Author"<author@noreply.github.com>; 主题: Re: [LeeSureman/Flat-Lattice-Transformer] 我发现作者是通过在测试集上的结果选取最优模型，想问下，这么做合理吗？目前按照作者选择最优结果的方法，在resume数据集上只达到95.15%，没有达到95.5%。 (#50)

这个数据集比较小，可能需要多跑几遍。我是在dev集上选择最优模型的

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

LeeSureman commented 3 years ago

你邮箱多少啊，我私发你吧，我之前的代码每个数据集都固定了随机数种子来提升能复现论文报道结果的可能，如果直接在这上面更新，种子就没用了。最近没多余的卡来找另外的随机数种子了

wn1652400018 commented 3 years ago

1652400018@qq.com。谢谢大佬。

------------------ 原始邮件 ------------------ 发件人: "李孝男"<notifications@github.com>; 发送时间: 2020年12月4日(星期五) 上午9:25 收件人: "LeeSureman/Flat-Lattice-Transformer"<Flat-Lattice-Transformer@noreply.github.com>; 抄送: "微冷/不觉寒"<1652400018@qq.com>; "Author"<author@noreply.github.com>; 主题: Re: [LeeSureman/Flat-Lattice-Transformer] 关于residual结构的疑问。在resume数据集上只达到95.15%，还没有达到95.5%。 (#50)

你邮箱多少啊，我私发你吧，我之前的代码每个数据集都固定了随机数种子来提升能复现论文报道结果的可能，如果直接在这上面更新，种子就没用了。最近没多余的卡来找另外的随机数种子了

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.