BrambleXu / knowledge-graph-learning

A curated list of awesome knowledge graph tutorials, projects and communities.
MIT License
735 stars 120 forks source link

KDD-2019-Human-in-the-loop ML Systems for Entity Extraction #283

Open BrambleXu opened 4 years ago

BrambleXu commented 4 years ago

Summary:

将regex和DL结合起来的方案,将weak label用于active learning.

Resource:

Paper information:

Notes:

一句话:We propose a framework that combines the advantages of regexes and deep learning, coupled with weak supervision and active learning.

Two common approaches to recognize regex-like entities are to (1) manually create a regex and (2) train a machine learning model。

如果复杂度提高,regex方法会变得非常复杂。ML方法是用来自动创建regex或不使用regex的方法。前者用得不多,后者其实就是序列标注模型。但是这种方法需要大量标注数据。

在之前的研究里human-in-the-loop (HIL) framework给出了最佳实践。首先用regex来产生一些weak label,用于pretrain NN。然后用manully labeled substring来fine-tune NN。结果显示这个效果比scratch的训练效果要好。Thus, the results indicate that writing a regex before manual labeling is highly desirable。但是这篇文章没有考虑到 时间投入 上的消耗。所以这篇文章就是考虑时间上的投入的。

没有文章将weak supervision和active learning结合起来过,这篇文章进行了研究。

Model Graph:

image

image

Result:

Thoughts:

无法用到我的研究里

Next Reading:

human-in-the-loop (HIL) framework: Regular Expression Guided Entity Mention Mining from Noisy Web Data