arXiv-2019/05-Neural Metric Learning for Fast End-to-End Relation Extraction

一句话总结：

通过结合metric learning和CNN来解决end-to-end relation problem。大幅缩短了训练和测试时间。

背景：Several recent efforts, under the theme of end-to-end RE, seek to exploit inter-task correlations by modeling both NER and RE tasks jointly. 问题：Earlier work in this area commonly reduces the task to a table-filling problem wherein an additional expensive decoding step involving beam search is applied to obtain globally consistent cell labels. In efforts that do not employ table-filling, global optimization in the form of CRFs with Viterbi decoding for the NER component is still necessary for competitive performance. 提案：We introduce a novel neural architecture utilizing the table structure, based on repeated applications of 2D convolutions for pooling local dependency and metric-based features, without the need for global optimization. 效果：1% gain (in F-score) over prior best results with training and testing times that are nearly four times faster — the latter highly advantageous for time-sensitive end user applications.

资源：

pdf
[code](
[paper-with-code](

论文信息：

Author: University of Kentucky
Dataset: ADE and CoNLL04
keywords: table-filling problem

笔记：

最近对于E2ERE的DL方法可以分为两类：

将DL用于学习table structure, first introduced by (Miwa and Sasaki, 2014), including Gupta et al. (2016), Pawar et al. (2017), and Zhang et al. (2017). 其中E2ERE问题被简化为一个table-filling problem。最近基于table structure的研究发现，通过相邻的cell来预测当前cell里的relation是一个课题，所以the table is filled incrementally leading to potential efficiency issues（导致了一个效率问题）。而且，这些方法还需要一个额外的计算消耗，decoding step, 涉及到beam search, 来得到globally optimal table-wide label assignment.
这种类型是将NER和RE通过联合的方式训练，而不需要table structure. 用于table filling的CRF算法被当做NER的一个组件使用，其中用到了Viterbi算法来寻找最好的label。 (Bekoulis et al., 2018b,a).

Our model utilizes the table formulation by embedding features along the third dimension（？）

通过使用deep feature，我们克服了efficiency issue问题。这里的deep feature指的是 local metric, dependency, and position. 初步决定（ preliminary decisions ）在earlier layers上做出最后预测label的部分，用softmax layer同时给NER和RE进行预测。因为 Bekoulis et al. (2018a) 将adversarial training (AT) 用于了RE，我们也探索了一下AT作为 regularization method的效果。

因此，our model is expected to improve over earlier efforts without a costly decoding step。（这篇论文的主旨）

模型图：

3.2 Our Model: Relation-Metric Network

3.2.1 Context Embeddings Layer

用character-CNN based representations来代替word embedding。

3.2.2 Relation-Metric Learning

3.2.3 Dependency Embeddings Table

3.2.4 Position Embeddings Table

和2014年那篇完全一样。

3.4 Adversarial Training (AT)

AT是一种除了standard dropout的另一种正则方式。大致思想是生成adversarial examples, 这些样本要做到容易被模型误分类，以此来提高模型的鲁棒性。applying the worst-case perturbations to existing examples. These perturbations are defined as changes to the input that maximizes the loss. Bekoulis et al. (2018a) 这篇文章显示AT对于E2ERE有非常棒的效果。所以我们也将AT用到了我们的模型上。但是与s. Bekoulis et al.(2018a) 不同，我们一生成的adversarial examples at the relation metric table level, corresponding to hidden representation G, instead of at the word embedding level.

结果：

接下来要看的论文：

Giannis Bekoulis, Johannes Deleu, Thomas Demeester, and Chris Develder. 2018a. Adversarial training for multi-context joint entity and relation extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2830–2836. #41

Giannis Bekoulis, Johannes Deleu, Thomas Demeester, and Chris Develder. 2018b. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Systems with Applications, 114:34–45. #193

BrambleXu / knowledge-graph-learning

arXiv-2019/05-Neural Metric Learning for Fast End-to-End Relation Extraction #196