Paper Review: SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline

Publisher

Association for Computational Linguistics

Link to The Paper

https://aclanthology.org/2020.sdp-1.37/

Name of The Authors

Ju JLiu MGao L et al.

Year of Publication

2020

Summary

This work of graph based extractive text summarization for scientific documents are motivated by the SumPip paper. However, their pipeline varies. They also introduced two new steps to control length of summary and remove irrelevant sentences. Also, this work is single document summarization compared to SumPip.

Graph Creation: Before creating sentence graph, they ranked sentence pairs using PageRank and stored in a matrix. Lowered scored sentences are deleted from candidate list. Then they created graph where each node is a sentence and the edges are connected iff there are one of the four patterns- deverbal noun reference, same entity continuation, discourse markers, and sentence similarity using cosine similarity. This similarity is found using SciBERT, a custom embedding for scientific docs.

Spectral Clustering: Spectral clustering is applied. From the Clusters Using Multi-sentence compression (MSC) one sentence taken from each cluster. Then these sentences are combined to create summary.

For comparison, they used different embedding systems. SciBERT, SummPip, SBERT. In ROGUE analysis, SciBERT performed better compared to others.

Contributions of The Paper

highlight the importance of sentence embedding for scientific work (mainly task-specific domains)
compare the performances between PageRank (Page et al., 1999) and the Maximal Marginal Relevance (MMR) (Carbonell and Goldstein, 1998). Verified that PageRank ranking algorithm performers better than MMR strategy in extractive task.
We achieve better ROUGE results than original model on both training dataset and blind test dataset. Besides, our model is also evaluated on the BERTScore metric

Comments

Maximal Marginal Relevance(MMR) performs better compared to generic ones for summarization. Need to check. textRank model(Barrios et al., 2016) with the Okapi BM25 similar-ity function. need to check.

RAISEDAL / RAISEReadingList