This is the official code for the paper ['Systematically Exploring Redundancy Reduction in Summarizing Long Documents']() (AACL 2020)
In this paper, we systematically explored ways for redundancy reduction for extractive summarization on long documents.
Make sure you have python 3
and pytorch
installed.
First need to install the tool rouge_papier_v2.
python setup.py install.
(This is a modified version from https://github.com/kedz/rouge_papier)
Other dependencies needed: numpy
, pandas
, nltk->\[word_tokenizer,stopwords\]
We do the experiments on two scientific paper datasets, Pubmed and arXiv.
The trained models that we showed in the paper is here.
If you download the trained model and data, and put them in folder ./pretrained_models
and ./scientific_paper_dataset/
, respectively, then you can use the following commands to evaluate the trained models.
python test.py --modelpath ./pretrained_models --datapath ./scientific_paper_dataset/ --dataset pubmed
For different models, you need to add different arguments seeing below:
The original model (ExtSumLG) --model ac
ExtSumLG + SR Decoder --model ac_sr
ExtSumLG + NeuSum Decoder --model ac_neusum
ExtSumLG + RdLoss (beta=0.3) --model ac --beta 0.3
ExtSumLG + Trigram Block --model ac --use_trigram_block
ExtSumLG + MMR-Select (lambda =0.6) --model ac --use_mmr --lambd 0.6
ExtSumLG + MMR-Select+ (lambda=0.6, gamma =0.99) --model ac --use_rl --use_mmr --lambd 0.6 --gamma 0.99
If you want to train your own model, you can run the following commands.
python main.py --modelpath ./pretrained_models --datapath ./scientific_paper_dataset/ --dataset pubmed
For different models, you need to add different arguments seeing below:
The original model (ExtSumLG) --model ac
ExtSumLG + SR Decoder --model ac_sr
ExtSumLG + NeuSum Decoder --model ac_neusum
ExtSumLG + RdLoss (beta=0.3) --model ac --beta 0.3
ExtSumLG + MMR-Select+ (lambda=0.6, gamma=0.99) --model ac --use_rl --use_mmr --lambd 0.6 --gamma 0.99
You are free to play with different hyper parameters, which can be found in main.py
.
Coming Soon.