TACL-2021-Partially Supervised Named Entity Recognition via the Expected Entity Ratio Loss

Summary:

大部分非native的数据标注者在标注entity的时候会漏掉一些（missing tag），所以如何利用好这种high precision low recall的训练数据是一个课题。如何识别那些missing tag？作者的方案是将那些没有被标记的tags当做潜变量（latent variables）。具体方案是将marginal tag likelihood training（边缘似然学习）和一个新的判别标准（Expected Entity Ratio，EER）结合在一起，来控制句子中entity tags的相对比例。

Resource:

pdf
code

Paper information:

Author:
Dataset:
keywords:

Notes:

对所有的unannotated tokens全部当做latent tags。在这种观点下，一个句子就是由一系列token和一些被观测到的pairs（tag, position）组成的。

Model Graph:

Result:：

在7个6种语言上的low-recall 数据上获得了不错的效果

Thoughts:

有用到中文的数据，需要确认一下是如何定义token的，既然是BILOU，那么应该是对中文先分词，然后当做token来训练。

Next Reading:

BrambleXu / knowledge-graph-learning

TACL-2021-Partially Supervised Named Entity Recognition via the Expected Entity Ratio Loss #334