We provide the source code for the paper "Controlling the Amount of Verbatim Copying in Abstractive Summarization", accepted at AAAI'20. If you find the code useful, please cite the following paper.
@inproceedings{control-over-copying:2020,
Author = {Kaiqiang Song and Bingqing Wang and Zhe Feng and Liu Ren and Fei Liu},
Title = {Controlling the Amount of Verbatim Copying in Abstractive Summarization},
Booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
Year = {2020}}
Our system seeks to re-write a lengthy sentence, often the 1st sentence of a news article, to a concise, title-like summary. The average input and output lengths are 31 words and 8 words, respectively.
The code takes as input a text file with one sentence per line. It generates a text file ("summary.txt") in the working folder as the outputs, where each source sentence is replaced by a title-like summary.
Example input and output are shown below.
Belgian authorities are investigating the killing of two policewomen and a passerby in the eastern city of Liege on Tuesday as a terror attack, the country's prosecutor said.
Belgium probes killing of two policewomen as terror attack .
The code is written in Python (v3.7) and Pytorch (v1.3). We suggest the following environment:
HINT: Notice that pytorch-pretrained-bert may change their name and content during time. It is currently named as transformers.
To install Python (v3.7), run the command:
$ wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
$ bash Anaconda3-2019.10-Linux-x86_64.sh
$ source ~/.bashrc
To install PyTorch (v1.3) and its dependencies, run the below command.
$ conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
To install pytorch-pretrained-bert and its dependencies, run the below command.
$ pip install spacy ftfy==4.4.3
$ python -m spacy download en
$ pip install pytorch-pretrained-bert
To install Pyrouge, run the command below. Pyrouge is a Python wrapper for the ROUGE toolkit, an automatic metric used for summary evaluation.
$ pip install pyrouge
Clone this repo. Download this ZIP file (others.zip
) containing trained model. Move the ZIP file to the working folder and uncompress.
$ git clone git@github.com:KaiQiangSong/control-over-copying.git
$ mv others.zip control-over-copying
$ cd control-over-copying
$ unzip others.zip
$ rm others.zip
$ mkdir log
Generating Summaries with our summarization model trained on selected dataset including: gigaword (default), newsroom.
$ python run.py --do_test --inputFile data/test.txt
Or if you want runing models other than that trained on gigaword:
$ python run.py --do_test --dataset newsroom --inputFile data/test.txt
Training the Model with train files and validation files.
$ python run.py --do_train --train_prefix data/train --valid_prefix data/valid
(Optional) Modify the training options.
You might want to change the parameters used for training. These are specified in ./setttings/training/gigaword_8.json
and explained blow.
{
"stopConditions":
{
"max_epoch":12,
"earlyStopping":false,
"rateReduce_bound":200000
},
"checkingPoints":
{
"checkMin":0,
"checkFreq":2000,
"everyEpoch":true
}
}
HINT*: 200K batches (used for rateReduce_bound
) with batch size of 8
, is slightly less than half of an epoch.