https://arxiv.org/abs/1801.10198 published as a conference paper at ICLR 2018

Abstract

This model uses decoder-only architecture which is modified from Transformer
Evaluation uses perplexity, ROUGE, human evaluations

1. Introduction

end-to-end models needs a lot of data
uses topic, non-wiki references as input to generate article text (output)

2. Related Work

2.1. Other datasets used in neural abstractive summarization

Benefit of ROUGE-1 recall score
- proportion of unigram/words in the output co-occuring in the input

2.2. Task involving wikipedia

This paper generates article only with refereced document(wiki document)

3. English wikipedia as a multi-document summarization dataset

D: Document
C_i: Document from Citation

Data Augmentation from this paper

Search Google with section title
Collect 10 page results except for wiki page of document itself
Remove clone
S_i (Search Result) for D

4. Methods and models

Select input subset
train abstractive model

4.1. Extractive stage

tf-idf was best from extractive stage (See Table 3.)

4.2. Abstractive stage

4.2.1. Data representation

uses sub-word to tokenize (Wu et al., 2016)
L is length for truncate (range 100 to 11000, 500 is medium, 11000 is long)

4.2.2. Baseline models

T-ED for typical Transformer encoder-decoder

4.2.3. Transformer Decoder (T-D)

T-D for better performanced baseline model, just remember the formula below

4.2.4. Transformer decoder with memory-compressed attention (T-DMCA)

T-DMCA is the final model from this paper

Local attention

Adopts block concepts that is simillar to DiSAN/BiBloSAN

Memory-compressed attention

Uses convolution to compress attention layers
This allows model to train 3 times longer length than T-D

They use LMLML architecture (L for Local attention, M for Memory-compressed attention)

5. Experiments

5.1. Evaluation

ROUGE metrics doesn't always yield same evaluation from human's judgement (Paulus et al., 2017)
ROUGE-F1: Harmonic mean of ROUGE-Recall, ROUGE-Precision
(From 5.3.) Human evaluation uses DUC-style.

flrngel / understanding-ai

Generating Wikipedia By Summarizing Long Sequences #5