ACL-2019/07-Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning

Summary:

只用未标注数据和实体字典来解决NER task。将DS-NER当做是个PU问题来解决。

Resource:

pdf，日语文章
code
[paper-with-code](

Paper information:

Author:
Dataset:
keywords:

Notes:

只用未标注数据和实体字典来解决NER task。最简单的方式是对于query text进行扫描，查看字典里是否包含entities，但是这个方法对于字典的质量要求很高。所以通常的表达不好。比如下面的例子，对于phrasal entities, 这种matching的方法就比较差了，它只能识别一个word。

一个解决办法是进一步使用监督式学习，并使用标注的字典。但这无法保证能cover所有的entity words。

这个任务可以被当做一个 positive-unlabeled (PU) learning problem，所以可以用PU算法来解决。标注的entity words当做 positive data，其他的当做 unlabeld data. 这个研究的关键是假设labeled P data能表示真正的distribution of class P。另外，通过dictionary标注的只能是phrasal entities中的一部分words. 为了解决这个问题，we propose an adapted method, motivated by the AdaSampling algorithm (Yang et al., 2017), to enrich the dictionary。

Model Graph:

Result:：

Thoughts:

Next Reading:

BrambleXu / knowledge-graph-learning

ACL-2019/07-Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning #264