ASoleimaniB / NonFactS

NonFactS: Nonfactual Summary Generation for Factuality Evaluation in Document Summarization
2 stars 0 forks source link

NonFactS

NonFactS: Nonfactual Summary Generation for Factuality Evaluation in Document Summarization (accepted at ACL2023)

Authors: Amir Soleimani, Christof Monz, Marcel Worring

Abstract

Pre-trained abstractive summarization models can generate fluent summaries and achieve high ROUGE scores. Previous research has found that these models often generate summaries that are inconsistent with their context document and contain nonfactual information. To evaluate factuality in document summarization, a document-level Natural Language Inference (NLI) classifier can be used. However, training such a classifier requires large-scale high-quality factual and nonfactual samples. To that end, we introduce NonFactS, a data generation model, to synthesize nonfactual summaries given a context document and a human-annotated (reference) factual summary. Compared to previous methods, our nonfactual samples are more abstractive and more similar to their corresponding factual samples, resulting in state-of-the-art performance on two factuality evaluation benchmarks, FALSESUM and SUMMAC. Our experiments demonstrate that even without human-annotated summaries, NonFactS can use random sentences to generate nonfactual summaries and a classifier trained on these samples generalizes to out-of-domain documents.

Limitations

NonFactS generates grammatically correct nonfactual summaries. However, in practice, summaries can be non-grammatical, noisy, and nonsensical. This can limit the generalization of our performance in such cases. Additionally, hypothesis-only results show that a considerable number of samples are identified correctly without their context document. The reason can be the memorized knowledge in pre-trained classifiers or surface features and semantic plausibility.

Broader Impact

Our model has no direct environmental impacts, fairness or privacy considerations. However, it is important to note that it must not be used as a fact-checking tool as there is a potential risk that false statements may be labelled as true. Our classifier evaluates the factuality of a summary based on a context document, and if the document is misleading, the summary can be factual based on misleading information. Additionally, NonFactS generates nonfactual summaries, which might have potential risks if misused for generating massive nonfactual summaries (claims). Addressing such risks is an open issue in the field and is not specific to our work.

Requirements

Installation

Note: double check if Transformers (huggingface) version is (4.4.0.dev0)

Training datasets (Factual and NonFactual summaries)

Training datasets contain 50% positive (Factual) summaries and 50% negative (NonFactual) summaries

Models

Classifier:

Generator:

Download the training and test dataset: