cl-tohoku / showcase_miyawaki

0 stars 1 forks source link

Multi-Task Semantic Dependency Parsing with Policy Gradient for Learning Easy-First Strategies #5

Closed smiyawaki0820 closed 3 years ago

smiyawaki0820 commented 4 years ago

1. どんなもの?

(タスク)

(提案)

(結果)

2. 先行研究と比べてどこがすごい?

IPS algorithm

graph-based

transition-based

(提案手法)graph-based + transition-based

3. 技術や手法のキモはどこ?

IPS algorithm

IPS model 🤔

Rewards of Policy Gradient

multi-task learning

SDP にはいくつかの linguistic formalisms が存在,overlap / synergy 箇所がある

4. どうやって有効だと検証した?

ablation によって multi-task および reinforcement learning の有効性を検証

main results

Arc length distributions for RL

5. 議論はある?

6. 次に読むべき論文は?

smiyawaki0820 commented 4 years ago

paper

abstract

In Semantic Dependency Parsing (SDP), semantic relations form directed acyclic graphs, rather than trees. We propose a new iterative predicate selection (IPS) algorithm for SDP. Our IPS algorithm combines the graph-based and transition-based parsing approaches in order to handle multiple semantic head words. We train the IPS model using a combination of multi-task learning and task-specific policy gradient training. Trained this way, IPS achieves a new state of the art on the SemEval 2015 Task 18 datasets. Furthermore, we observe that policy gradient training learns an easy-first strategy.

task : SDP .... 意味的関係が直接的な acyclic graph を形成

bib

@inproceedings{kurita-sogaard-2019-multi,
     author = {Kurita, Shuhei  and S{\o}gaard, Anders},
     title = {Multi-Task Semantic Dependency Parsing with Policy Gradient for Learning Easy-First Strategies},
    booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
    year = {2019},
    url = {https://www.aclweb.org/anthology/P19-1232},
    doi = {10.18653/v1/P19-1232},
    pages = {2420--2430},
}
smiyawaki0820 commented 4 years ago

paper

abstract

task: Semantic Dependency Parsing(SDP)

propose: 反復的述語選択(IPS)アルゴリズム

train IPS model

Result & Analysis

In Semantic Dependency Parsing (SDP), semantic relations form directed acyclic graphs, rather than trees. We propose a new iterative predicate selection (IPS) algorithm for SDP. Our IPS algorithm combines the graph-based and transition-based parsing approaches in order to handle multiple semantic head words. We train the IPS model using a combination of multi-task learning and task-specific policy gradient training. Trained this way, IPS achieves a new state of the art on the SemEval 2015 Task 18 datasets. Furthermore, we observe that policy gradient training learns an easy-first strategy.

bib

@inproceedings{kurita-sogaard-2019-multi,
    author = {Kurita, Shuhei  and S{\o}gaard, Anders},
    title = {Multi-Task Semantic Dependency Parsing with Policy Gradient for Learning Easy-First Strategies},
    booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
    year = {2019},
    publisher = {Association for Computational Linguistics},
    url = {https://www.aclweb.org/anthology/P19-1232},
    doi = {10.18653/v1/P19-1232},
    pages = {2420--2430},
}
smiyawaki0820 commented 4 years ago

1. Introduction

SDP

IPS algorithms

parser scoring 備考 特徴
transition-based state 間の遷移に対して 段々と dep-graph を構築 error 伝搬(unsuitable: long)
graph-based 全 edge に対して tree decoding algorithm を採用

Contributions

  1. transition/graph -based approach を統合した SDP における新しい parsing algorithm の提案
  2. 本 parsing algorithm に対する multi-task learning が single-task よりも良い性能
  3. task-specific policy gradient fine-tuning によってモデル改善
  4. SOTA on 三つの formalism
  5. policy gradient fine-tuning が easy-first に沿った学習をする
smiyawaki0820 commented 4 years ago

Related Work

transition-based parsing algorithms

graph-based parsing algorithms

transition-based parsers w/ reinforcement learning

Zhang and Chan, 2009

Fried and Klein, 2018

Lee et al., 2018

smiyawaki0820 commented 4 years ago

Model

Iterative Predicate Selection (IPS)

(提案)new SDP-algorithm based on the head-selection algorithm (Zhang et al., 2017)

提案アルゴリズム

how to create semantic dependency arcs

  1. 各単語 w_i に対して,transitions 候補 T_i^τ から head arc を選択
  2. the partial semantic dep-graph を更新
  3. if {全ての単語が NULL を選択} then {終了} else {1. へ}
スクリーンショット 2020-02-02 22 05 13

non-deterministic oracle problem : there are several paths, depending on the orders of creating the arcs

In IPS parsing: path ごとに難易度が異なる

※ sequence tagger では,その effectiveness が proven されている

In this paper ...

smiyawaki0820 commented 4 years ago

Neural Model

スクリーンショット 2020-02-02 23 05 41

Sentence Encoder

Encoder of partial SDP graphs

dep-flags: F'

Predicate Selection Model

transition score:

スクリーンショット 2020-02-02 23 44 15 スクリーンショット 2020-02-02 23 45 31

For supervised learning ... cross entropy loss

Labeling model

also develop a semantic dep-labeling NN

label l の score: from pred j to word i

smiyawaki0820 commented 4 years ago

Reinforcement Learning

Policy gradient

Williams, 1992

a method for learning to iteratively act according to a dynamic environment in order to optimize future rewards

  • the agent ~ NN model predicting the transition probabilities p_i(t_j^τ)
  • the environment ~ include the partial SDP graph y^τ
  • the rewards ~ computed by comparing the predicted parse graph to the gold parse graph y^g

objective function: the rewards を最大化

スクリーンショット 2020-02-03 1 05 51

the transition policy for the w_i

PG learning algorithm for SDP

スクリーンショット 2020-02-03 1 17 36

How the cross entropy loss and the policy gradient loss are similar ?

No. 強化学習(policy gradient loss) 教師あり学習(cross entropy loss)
sampling of transitions allow model to explore transition paths never follow
decisions dependent independent(θ updated after finishing parsing)
loss can be negative non-negative

Rewards for SDP

intermediate rewards(r_i^τ): given during parsing, at dfferent τ

スクリーンショット 2020-02-03 1 37 22
smiyawaki0820 commented 4 years ago

Implementation Details

smiyawaki0820 commented 4 years ago

Experiments

(比較実験) IPS + ML + RL

(推論時)

smiyawaki0820 commented 4 years ago

Results

スクリーンショット 2020-02-03 1 53 21

Evaluating Our Parser w/o Lemma

スクリーンショット 2020-02-03 2 00 55

Effect of Reinforcement Learning

スクリーンショット 2020-02-03 2 05 16 スクリーンショット 2020-02-03 2 33 17