baoguangsheng / ctx-rewriter-for-summ

Code base for "Contextualized Rewriting for Text Summarization"
MIT License
28 stars 4 forks source link

ContextRewriter

This code is for AAAI 2021 paper Contextualized Rewriting for Text Summarization

Python Version: Python3.6

Package Requirements: torch==1.1.0 pytorch_transformers tensorboardX multiprocess pyrouge

Some codes are borrowed from ONMT and PreSumm.

Results

Contextualized rewriter applied to various extractive summarizers on CNN/DailyMail (30/9/2020):

Models ROUGE-1 ROUGE-2 ROUGE-L Words
Oracle of BERT-Ext 46.77 26.78 43.32 112
+ ContextRewriter 52.57 (+5.80) 29.71 (+2.93) 49.69 (+6.37) 63
LEAD-3 40.34 17.70 36.57 85
+ ContextRewriter 41.09 (+0.75) 18.19 (+0.49) 38.06 (+1.49) 55
BERTSUMEXT w/o Tri-Bloc 42.50 19.88 38.91 80
+ ContextRewriter 43.31 (+0.81) 20.44 (+0.56) 40.33 (+1.42) 54
BERT-Ext (ours) 41.04 19.56 37.66 105
+ ContextRewriter 43.52 (+2.48) 20.57 (+1.01) 40.56 (+2.90) 66

Model Evaluation

Contextualized rewriter can be evaluated through this experimental scripts. The Lead3, BERTSUMEXT, and BERT-Ext extractive summarizers are included. All the parameters and settings are hard-coded in the py file.

    python src/exp_varext_guidabs.py 

The rewriter can also be easily applied to other extractive summarizer using following code. The full example can be found in context_rewriter.py.

    rewriter = ContextRewriter(args.model_file)

    doc_lines = ["georgia high school ...", "less than 24 hours ...", ...]
    ext_lines = ["georgia high school ...", "less than 24 hours ..."]
    res_lines = rewriter.rewrite(doc_lines, ext_lines)

Model Training

Contextualized rewriter can be trained with following script. All the settings are packed into the .py file.

    python src/exp_guidabs.py

By default, the input data path is ./bert_data, and the output model path is ./exp_guidabs