BrambleXu / knowledge-graph-learning

A curated list of awesome knowledge graph tutorials, projects and communities.
MIT License
735 stars 120 forks source link

arXiv-2019/8-Simplify the Usage of Lexicon in Chinese NER #280

Open BrambleXu opened 4 years ago

BrambleXu commented 4 years ago

Summary:

因为 Lattice-LSTM #279 运算效率太低。所以这篇文章希望更有效地把lexicon information导入到character representation里。

Resource:

Paper information:

Notes:

作者分析了 Lattice-LSTM的优缺点。

优点:

所以这篇文章的想法是在保持上面优点的情况下,舍弃原来的LSTM模型。作者提出的方法是提出了一种新的编码方式。

一个句子s中的每一个字符c,都有对应的4个word sets。这个word sets是通过“BMES”4个标签来标记的。

如果集合为空则成员为None。

Consider the sentence s = {c1, · · · , c5} and suppose that {c1, c2}, {c1, c2, c3}, {c2, c3, c4}, and {c2, c3, c4, c5} match the lexicon. Then, for c2, B(c2) = {{c2, c3, c4}, {c2, c3, c4, c5}}, M(c2) = {{c1, c2, c3}}, E(c2) = {{c1, c2}}, and S(c2) = {NONE}.

这里例子里的B(c2) = {{c2, c3, c4}, {c2, c3, c4, c5}},B(“南”)= {南京市,南京大桥}。

image

V^s的部分是一个map 函数,把一个word set变成fixed-dimensional vector。这里引入了mean-pooling算法来表示word set S的vector representation:

image

但是mean-pooling的效果并不好。Lattice-LSTM里使用了dynamical weighting algorithm,为了保证速度,这里才用的weighting 方法是 the frequency of the word as an indication of its weight. The basic idea beneath this algorithm is that the more times a character sequence occurs in the data, the more likely it is a word. Note that, the frequency of a word is a static value and can be obtained offline. This can greatly accelerate the calculation of the weight of each word (e.g., using a lookup table).

image

这里还专门提高infrequent words的权重

image

Model Graph:

Result:

Thoughts:

Next Reading: