This code is for AAAI 2021 paper Contextualized Rewriting for Text Summarization
Python Version: Python3.6
Package Requirements: torch==1.1.0 pytorch_transformers tensorboardX multiprocess pyrouge
Some codes are borrowed from ONMT and PreSumm.
Contextualized rewriter applied to various extractive summarizers on CNN/DailyMail (30/9/2020):
Models | ROUGE-1 | ROUGE-2 | ROUGE-L | Words |
---|---|---|---|---|
Oracle of BERT-Ext | 46.77 | 26.78 | 43.32 | 112 |
+ ContextRewriter | 52.57 (+5.80) | 29.71 (+2.93) | 49.69 (+6.37) | 63 |
LEAD-3 | 40.34 | 17.70 | 36.57 | 85 |
+ ContextRewriter | 41.09 (+0.75) | 18.19 (+0.49) | 38.06 (+1.49) | 55 |
BERTSUMEXT w/o Tri-Bloc | 42.50 | 19.88 | 38.91 | 80 |
+ ContextRewriter | 43.31 (+0.81) | 20.44 (+0.56) | 40.33 (+1.42) | 54 |
BERT-Ext (ours) | 41.04 | 19.56 | 37.66 | 105 |
+ ContextRewriter | 43.52 (+2.48) | 20.57 (+1.01) | 40.56 (+2.90) | 66 |
Contextualized rewriter can be evaluated through this experimental scripts. The Lead3, BERTSUMEXT, and BERT-Ext extractive summarizers are included. All the parameters and settings are hard-coded in the py file.
python src/exp_varext_guidabs.py
The rewriter can also be easily applied to other extractive summarizer using following code. The full example can be found in context_rewriter.py.
rewriter = ContextRewriter(args.model_file)
doc_lines = ["georgia high school ...", "less than 24 hours ...", ...]
ext_lines = ["georgia high school ...", "less than 24 hours ..."]
res_lines = rewriter.rewrite(doc_lines, ext_lines)
Contextualized rewriter can be trained with following script. All the settings are packed into the .py file.
python src/exp_guidabs.py
By default, the input data path is ./bert_data, and the output model path is ./exp_guidabs