This work of graph based extractive text summarization for scientific documents are motivated by the SumPip paper. However, their pipeline varies. They also introduced two new steps to control length of summary and remove irrelevant sentences. Also, this work is single document summarization compared to SumPip.
Graph Creation:
Before creating sentence graph, they ranked sentence pairs using PageRank and stored in a matrix. Lowered scored sentences are deleted from candidate list. Then they created graph where each node is a sentence and the edges are connected iff there are one of the four patterns- deverbal noun reference, same entity continuation, discourse markers, and sentence similarity using cosine similarity. This similarity is found using SciBERT, a custom embedding for scientific docs.
Spectral Clustering: Spectral clustering is applied. From the Clusters Using Multi-sentence compression (MSC) one sentence taken from each cluster. Then these sentences are combined to create summary.
For comparison, they used different embedding systems. SciBERT, SummPip, SBERT. In ROGUE analysis, SciBERT performed better compared to others.
Contributions of The Paper
highlight the importance of sentence embedding for scientific work (mainly task-specific domains)
compare the performances between PageRank (Page et al., 1999) and the Maximal Marginal Relevance (MMR) (Carbonell and Goldstein, 1998). Verified that PageRank ranking algorithm performers better than MMR strategy in extractive task.
We achieve better ROUGE results than original model on both training dataset and blind test dataset. Besides, our model is also evaluated on the BERTScore metric
Comments
Maximal Marginal Relevance(MMR) performs better compared to generic ones for summarization. Need to check.
textRank model(Barrios et al., 2016) with the Okapi BM25 similar-ity function. need to check.
Need to recheck this paper. Has inconsistency (or I am missing sth?). Showing abstractive summarization while it is unsupervised. How?
Publisher
Association for Computational Linguistics
Link to The Paper
https://aclanthology.org/2020.sdp-1.37/
Name of The Authors
Ju JLiu MGao L et al.
Year of Publication
2020
Summary
This work of graph based extractive text summarization for scientific documents are motivated by the SumPip paper. However, their pipeline varies. They also introduced two new steps to control length of summary and remove irrelevant sentences. Also, this work is single document summarization compared to SumPip.
Graph Creation: Before creating sentence graph, they ranked sentence pairs using PageRank and stored in a matrix. Lowered scored sentences are deleted from candidate list. Then they created graph where each node is a sentence and the edges are connected iff there are one of the four patterns- deverbal noun reference, same entity continuation, discourse markers, and sentence similarity using cosine similarity. This similarity is found using SciBERT, a custom embedding for scientific docs.
Spectral Clustering: Spectral clustering is applied. From the Clusters Using Multi-sentence compression (MSC) one sentence taken from each cluster. Then these sentences are combined to create summary.
For comparison, they used different embedding systems. SciBERT, SummPip, SBERT. In ROGUE analysis, SciBERT performed better compared to others.
Contributions of The Paper
Comments
Maximal Marginal Relevance(MMR) performs better compared to generic ones for summarization. Need to check. textRank model(Barrios et al., 2016) with the Okapi BM25 similar-ity function. need to check.
Need to recheck this paper. Has inconsistency (or I am missing sth?). Showing abstractive summarization while it is unsupervised. How?