BrambleXu / knowledge-graph-learning

A curated list of awesome knowledge graph tutorials, projects and communities.
MIT License
738 stars 120 forks source link

EBOOK-2018-An Introduction to Active Learning #329

Open BrambleXu opened 3 years ago

BrambleXu commented 3 years ago

Summary:

来自figure eight的关于AC的介绍资料

Resource:

Paper information:

Notes:

课题:快速标注的同时,标注那些更加具代表性的样本

image

Not all examples carry the same quality of information. Some data is going to be redundant. Identifying the best instances to train a model happens at two key times: before the model is even built and while the model is being trained. The former is called “prioritization(优先顺序).” The latter is called “active learning.” (每个样本包含的信息的质量是不一样的。一些数据是重复的。辨别一个好样本有两个关键点,在模型构建前,和模型构建时。前者叫作优先级,后者叫作主动学习。)

image

active learning is a prime example of the marriage of human and machine intelligence. Humans provide the labels that train the model (labeling faster), the model decides what labels it needs to improve (labeling smarter), and humans again provide those labels. (AC是人类和机器合作的例子。人类提供标签给模型,模型则决定需要哪些标签)

How AC work

决定是否选择一个特殊的样本取决于,获取这个样本的成本和这个该样本带来的信息质量的差。

下面是3种在线流式的取样方法:

从3种AC的策略来说,pooling是最实用的

HOW DOES AN ACTIVE LEARNER DECIDE WHICH ROW TO LABEL?

下面都是针对pooling方法的,如何选取下一个要标注的数据?

上面这些模型有共同的部件:

这些不同的策略其实是可以互相补强的。

what are the best use cases for active learning? Is it right for your project?

image

早点选择ac策略比早点构建模型更有利。

WHEN ACTIVE LEARNING MIGHT NOT BE THE RIGHT CHOICE

下面是一些不适合使用AC的情况

ARE THERE PARTICULAR DATA TYPES THAT WORK BEST FOR ACTIVE LEARNING?

Active learning can work for any application. NLP, computer vision, speech-to-text, video

SOME FINAL THOUGHTS

The fact remains that active learners get better accuracy with fewer rows than generic supervised approaches. And that’s never a bad thing. Especially when it frees up a little budget for the R&D project you’ve been waiting to try.