BrambleXu / knowledge-graph-learning

A curated list of awesome knowledge graph tutorials, projects and communities.
MIT License
735 stars 120 forks source link

BIBM-2019-Distantly Supervised Biomedical Named Entity Recognition with Dictionary Expansion #282

Open BrambleXu opened 4 years ago

BrambleXu commented 4 years ago

Summary:

275 在Bio领域的应用。主体是基于AutoNER的,只不过重心放在了如何自动构建高质量字典上。

Resource:

Paper information:

Notes:

感觉这篇文章和我的大方向一样,把AutoNER用到某个具体的domain里。

A recent work, AutoNER [24], uses a neural model that leverages distant supervision from entity dictionaries. However, these existing studies can only use limited information from the user-input dictionaries, especially when the dictionaries are incomplete in real word applications.

Our AUTOBIONER framework does not need any human annotated data and relies on incomplete entity dictionaries.

AUTOBIONER first exploits statistical signals from massive corpora for candidate entity generation and user-input dictionaries for training example annotation. Since the dictionaries are assumed to be incomplete, AUTOBIONER performs a novel automatic entity set expansion for corpus-level new entity recognition and dictionary completion.

It treats matched entities as positive examples to infer the type of unmatched candidates using context information. The expanded dictionaries are then used as distant supervision to train a neural model for BioNER.

根据上面的介绍,他们的重点是放在如何自动构建一个 expanded dictionary上了。

Model Graph:

image

我上面的看法是正确的,这篇文章主要是构建dictionary的。

A. Phrase Mining and Dictionary Matching

Phrase Mining. 使用AutoPhrase。

Dictionary Tailoring. 为了防止在匹配alias时产生过多的false-positive,添加了一个dictionary tailoring步骤。把字典针对corpus进行过滤。即如果正规名称没有出现在coppus里出现过哪怕一次的话,就删除这个单词。(这个主要是考虑提高precision。但是在真正的环境中,是没有corpus这种东西的。谁都不知道公司名会出现在什么地方。)

B. Entity Expansion

image

这部分是这篇论文最核心的内容。

Result:

Thoughts:

Next Reading:

JackySnake commented 4 years ago

请问这篇论文在哪里看?我一直没有找到