Open flrngel opened 6 years ago
https://arxiv.org/abs/1801.10198 published as a conference paper at ICLR 2018
Data Augmentation from this paper
tf-idf was best from extractive stage (See Table 3.)
T-ED for typical Transformer encoder-decoder
T-D for better performanced baseline model, just remember the formula below
T-DMCA is the final model from this paper
Local attention
Memory-compressed attention
They use LMLML architecture (L for Local attention, M for Memory-compressed attention)
https://arxiv.org/abs/1801.10198 published as a conference paper at ICLR 2018
Abstract
1. Introduction
2. Related Work
2.1. Other datasets used in neural abstractive summarization
2.2. Task involving wikipedia
3. English wikipedia as a multi-document summarization dataset
Data Augmentation from this paper
4. Methods and models
4.1. Extractive stage
tf-idf was best from extractive stage (See Table 3.)
4.2. Abstractive stage
4.2.1. Data representation
4.2.2. Baseline models
T-ED for typical Transformer encoder-decoder
4.2.3. Transformer Decoder (T-D)
T-D for better performanced baseline model, just remember the formula below
4.2.4. Transformer decoder with memory-compressed attention (T-DMCA)
T-DMCA is the final model from this paper
Local attention
Memory-compressed attention
They use LMLML architecture (L for Local attention, M for Memory-compressed attention)
5. Experiments
5.1. Evaluation