JudePark96 / paper-summaries

The inventory of history of taking paper summary by me.
1 stars 1 forks source link

πŸš€ [2020] Exclusive Hierarchical Decoding for Deep Keyphrase Generation #2

Open JudePark96 opened 4 years ago

JudePark96 commented 4 years ago
λ…Όλ¬Έ Exclusive Hierarchical Decoding for Deep Keyphrase Generation
μ €μž Wang Chen, Hou Pong Chan, Piji Li, Irwin King
링크 https://arxiv.org/abs/2004.08511
ν•™νšŒ ACL 2020

Contents

1. μ΄ˆλ‘μ€ 뭐라고 λ§ν•˜κ³  μžˆμ–΄ ?

2. μ£Όμš” 기여점은 뭐야 ?

3. μ΄μ „μ˜ μ ‘κ·Όκ³ΌλŠ” 뭐가 λ‹€λ₯Έ 것 κ°™μ•„ ?

4. μ–΄λ–€ κ±Έ μ œμ•ˆν•  수 μžˆμ„κΉŒ ?

5. λ‹€μŒ 논문은 무엇을 μ½μ–΄μ•Όν• κΉŒ ?

JudePark96 commented 4 years ago

1. μ΄ˆλ‘μ€ 뭐라고 λ§ν•˜κ³  μžˆμ–΄ ?

졜근의 접근은 λͺ¨λΈμ΄ keyphrase λ₯Ό μ˜ˆμΈ‘ν•  뿐만 μ•„λ‹ˆλΌ keyphrase 의 수 λ˜ν•œ κ²°μ •ν•΄μ•Όν–ˆλ‹€. μ΄λŸ¬ν•œ 접근은 sequential decoding process λ₯Ό μ‚¬μš©ν•˜κ³  μžˆμ—ˆλ‹€. 그런데, μ΄λŸ¬ν•œ process λŠ” keyphrase 에 μ‘΄μž¬ν•˜λŠ” intrinsic hierarchical compositionality λ₯Ό λ¬΄μ‹œν•œλ‹€. 더 λ‚˜μ•„κ°€μ„œ μ΄μ „μ˜ 접근은 duplicated keyphrases λ₯Ό μƒμ„±ν•˜λŠ” κ²½ν–₯이 있고 μ΄λŠ” computing resource 와 time 을 λ‚­λΉ„ν•œλ‹€.

μ΄λŸ¬ν•œ λ¬Έμ œμ μ„ κ·Ήλ³΅ν•˜κΈ° μœ„ν•΄ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” exclusive hierarchical decoding framework that includes a hierarchical decoding process and either a soft or a hard exclusion mechanism 을 μ œμ•ˆν•œλ‹€.

JudePark96 commented 4 years ago

2. μ£Όμš” 기여점은 뭐야 ?

Sequential decoding method λ₯Ό ν†΅ν•˜μ—¬ keyphrases λ₯Ό μƒμ„±ν•˜λŠ” 건 μœ„μ—μ„œ μ–ΈκΈ‰ν–ˆλ“―μ΄ 두 가지 문제점이 μžˆλ‹€.

λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μœ„μ˜ λ¬Έμ œλ“€μ„ κ·Ήλ³΅ν•˜κΈ° μœ„ν•΄ Novel exclusive hierarchical decoding framework λ₯Ό μ œμ•ˆν•¨.

Methodology

Screen Shot 2020-08-03 at 8 46 50 PM

Figure 2 : exclusive hierarchical decoding 에 λŒ€ν•œ κ·Έλ¦Ό. hi λŠ” i-th PD step 의 hidden state μž„. h{i, j} λŠ” j-th WD hidden state κ³Ό corresponding 함. [neopd] token 은 PD κ°€ λλ‚˜μ§€ μ•Šμ•˜λ‹€λŠ” λœ»μž„. [eowd] token 은 WD κ°€ terminate ν•œλ‹€λŠ” λœ»μž„. [eopd] token 은 PD κ°€ 끝났고 λͺ¨λ“  decoding process κ°€ λ§ˆλ¬΄λ¦¬λ˜μ—ˆλ‹€λŠ” λœ»μž„. [m1, ..., m{l_x}] 은 document λ‘œλΆ€ν„° encoding 된 hidden states λ₯Ό μ˜λ―Έν•¨. PD-Attention κ³Ό WD-Attention 은 각각 PD 와 WD μ—μ„œ μ‚¬μš©λ˜λŠ” attention mechanism μž„. \betai λŠ” i-th step μ—μ„œμ˜ PD attention score μž„. \hat{h}{i, j} 은 WD attentional vector μž„. EL/ES λŠ” exclusive loss λ˜λŠ” exclusive search κ°€ μ‚¬μš©λ˜μ—ˆμŒμ„ μ˜λ―Έν•¨.

Sequential Encoder

Context-aware representation 을 μ–»κΈ° μœ„ν•΄ two-layered bi-directional GRU λ₯Ό encoder 둜 μ‚¬μš©ν•œλ‹€.

Keyphrase Generation 의 λ§Žμ€ λ…Όλ¬Έμ—μ„œ GRU λ₯Ό encoder 둜 μ‚¬μš©ν•˜κ³  μžˆλ‹€.

Phrase-level Decoder

Phrase-level decoder λŠ” uni-directional GRU λ₯Ό μ‚¬μš©ν•˜μ˜€λ‹€.

Screen Shot 2020-08-04 at 12 49 59 AM

\widetilde{\h_{i-1, end}} λŠ” (i-1)-th PD step μ—μ„œ WD step 을 거친 attentional vector 이닀. μˆ˜μ‹μ— λ”°λ₯΄λ©΄ WD step 을 거친 hidden state λ₯Ό phrase level decoder 의 next state 에 λ“€μ–΄κ°€λ©° 이것이 recursive ν•˜κ²Œ λ™μž‘ν•œλ‹€. 그리고 이 representation κ³Ό encoder 의 representation 을 λ°”νƒ•μœΌλ‘œ μ•„λž˜μ˜ μˆ˜μ‹μ„ 톡해 PD attention score λ₯Ό μΆ”μΆœν•œλ‹€.

Screen Shot 2020-08-04 at 1 10 54 AM

eq (3) 의 $h_i$ λŠ” PD 의 hidden state 이며 $W_1$ 은 parameter matrix, $m_n$ 은 encoder 의 representation 이닀. 이 μˆ˜μ‹μ€ bi-linear transformation 으둜 κ΅¬μ„±λ˜μ–΄μ ΈμžˆμœΌλ©° softmax λ₯Ό ν†΅ν•˜μ—¬ (2) 와 같이 attention score λ₯Ό μΆ”μΆœν•œλ‹€.

Word-level Decoder

Screen Shot 2020-08-05 at 2 15 55 PM

$i$ λŠ” PD-step, $j-1$ 은 WD-step 을 μ˜λ―Έν•œλ‹€. GRU 에$h{i, j-1}$ 으둜 μ—°μ‚°ν•˜μ—¬ $h{i, j}$ λ₯Ό μ–»λŠ”λ‹€.

Screen Shot 2020-08-05 at 2 42 18 PM

νŠΉμ΄μ μ€ PD-Attention Score λ₯Ό ν†΅ν•˜μ—¬ WD-Attention Score λ₯Ό scale ν•˜κ³  μžˆλ‹€λŠ” 점이닀.

Screen Shot 2020-08-05 at 2 46 20 PM

이λ₯Ό 톡해 λ‚˜μ˜¨ hidden state 둜 decoding 을 ν•˜λŠ”λ° copy mechanism 을 μ‚¬μš©ν•œλ‹€.

WD Process λŠ” [eowd] token 이 λ‚˜μ™”μ„ λ•Œ terminate ν•œλ‹€. hierarchical decoding 은 [eopd] token 이 λ‚˜μ™”μ„ λ•Œ terminate ν•œλ‹€.

JudePark96 commented 4 years ago

3. μ΄μ „μ˜ μ ‘κ·Όκ³ΌλŠ” 뭐가 λ‹€λ₯Έ 것 κ°™μ•„ ?

One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases 같은 μ΄μ „μ˜ μ ‘κ·Όμ—μ„œλŠ” 주둜 decoding process 에 λŒ€ν•œ contribution 이 λ§Žμ•˜λ‹€. μ–ΈκΈ‰ν•œ λ…Όλ¬Έμ˜ orthogonal regularization, semantic coverage λΌλŠ” contribution λ˜ν•œ decoding process 에 κ΄€ν•œ λΆ€λΆ„μ΄μ—ˆλ‹€. ν•˜μ§€λ§Œ μ—¬μ „νžˆ decoding process λŠ” sequential process 둜 μ§„ν–‰λœλ‹€λŠ” ν•œκ³„μ μ„ 가지고 μžˆμ—ˆλ‹€.

λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ΄λŸ¬ν•œ ν•œκ³„μ μ„ κ·Ήλ³΅ν•˜κΈ° μœ„ν•΄ decoding process λ₯Ό hierarchical process 둜 μ§„ν–‰ν–ˆλ‹€λŠ” 것이 main contribution 이라고 μƒκ°ν•œλ‹€.

JudePark96 commented 4 years ago

4. μ–΄λ–€ κ±Έ μ œμ•ˆν•  수 μžˆμ„κΉŒ ?

λ³Έ λ…Όλ¬Έκ³Ό μ΄μ „μ˜ 접근을 λ³Έ κ²°κ³Ό, decoding process 에 κ΄€ν•œ contribution 이 μ£Όμ˜€λ‹€. 차별점을 μ£ΌλŠ” μ œμ•ˆ 사항은 μ•„λž˜μ™€ κ°™λ‹€κ³  μƒκ°ν•œλ‹€.

등이 μžˆμ„ 것 κ°™λ‹€.

JudePark96 commented 4 years ago

5. λ‹€μŒ 논문은 무엇을 μ½μ–΄μ•Όν• κΉŒ ?