EMNLP-2019/11-CrossWeigh: Training Named Entity Tagger from Imperfect Annotations

Summary:

更正了test set里的标注错误，然后通过对句子评分，判断潜在的标注错误，然后将这些句子的权重降低。这样学到的NER模型是能意识到标注错误的。

Resource:

pdf, 日语文章
code
[paper-with-code](

Paper information:

Author: University of Illinois at Urbana-Champaign
Dataset: CoNLL03
keywords:

Notes:

NER的标注错误有两种，一种是test set里会影响验证的结果，第二种是training set里的mistakes会影响训练出的NER模型。这篇文章手工修正了CoNLL03里test set里的标注错误，然后在各种模型上进行了测试。然后提出了一个新的框架，CrossWeight，来解决训练过程中的label mistabkes。

CrossWeigh分成两部分，预测错误和错误权重调整

mistake estimation: it identifies the potential label mistakes in training data through a cross-checking process
mistake re-weighing: it lowers the weights of these instances during the training of the final NER model. The cross-checking process is inspired by the k-fold cross validation; differently, in each fold’s training data, it removes the data containing any of entities that appeared in this fold.

Model Graph:

Result:：

Thoughts:

Next Reading:

BrambleXu / knowledge-graph-learning

EMNLP-2019/11-CrossWeigh: Training Named Entity Tagger from Imperfect Annotations #263