BrambleXu / knowledge-graph-learning

A curated list of awesome knowledge graph tutorials, projects and communities.
MIT License
738 stars 120 forks source link

Book-2012-Natural Language Annotation for Machine Learning #322

Open BrambleXu opened 3 years ago

BrambleXu commented 3 years ago

Summary:

这是本关于标注的书。

Resource:

Paper information:

Notes:

我发现自己对于AL的看法是错的。我一直以为AL是用于annotation的,但其实是为了训练模型的一种方法。因为AL的目标并不是尽可能多的去标注数据。只要模型正确率变高了,那么就可以不用再标注下去了。

关于下面第12章,众包标注是今后一个趋势。而对于大量数据,boostring, active learning, semi-supervised learning则是三种方案。

Handling Big Data

之前的问题都是annotation side,这部分是ML side。而最关键的策略就是如何最大限度利用好少量的标注数据,以及如何利用好大数据的一些特性。(The strategy shared by all of the approaches we’ll cover in this section is to try to make the best of as little annotated (training) data as possible, and to leverage different properties of the Big Data. )

大数据的定义:体积,速度和变化(volume, velocity, and variety)

Boosting

Active Learning

image

Semi-Supervised Learning

image

image

12. Afterword: The Future of Annotation

  1. Crowdsourcing Annotation
    1. Amazon’s Mechanical Turk
    2. Games with a Purpose (GWAP)
    3. User-Generated Content
  2. Handling Big Data
    1. Boosting
    2. Active Learning
    3. Semi-Supervised Learning
  3. NLP Online and in the Cloud
    1. Distributed Computing
    2. Shared Language Resources
    3. Shared Language Applications

Model Graph:

Result:

Thoughts:

Next Reading: