为什么精度这么高？

gitabtion / BertBasedCorrectionModels

PyTorch impelementations of BERT-based Spelling Error Correction Models. 基于BERT的文本纠错模型，使用PyTorch实现。

Apache License 2.0

265 stars 43 forks source link

Closed YouranShan closed 3 years ago

YouranShan commented 3 years ago

为什么这个跑出来的精度比论文上的精度高那么多，就连bert的微调都可以达到state of the art？一开始看训练数据量不一样，以为删除了脏数据所以提升，把训练数据全部加上去，精度还是很高，连bert微调都能当sota了。。。为啥呀

gitabtion commented 3 years ago

主要是利用了mlm的权重，现在绝大多数论文的实现都没用这个权重，导致效果上不去

benbijituo commented 3 years ago

数据不一样了。这里的数据是作者自己预处理的，很多原有数据的噪音都没有了。你去和spellgcn使用的数据对比一下就知道了，很多不一样的地方。 mlm的权重spellgcn也用了。

Dioxideme commented 2 years ago

数据不一样了。这里的数据是作者自己预处理的，很多原有数据的噪音都没有了。你去和spellgcn使用的数据对比一下就知道了，很多不一样的地方。 mlm的权重spellgcn也用了。

你好，能举例说一下哪些招银没有了吗，我简单对比了一下发现大部分数据都是一样的😂只发现了”著-着“这种情况