It is well-known that abstractive summaries are subject tohallucination---including material that is not supported by the original text.While summaries can be made hallucination-free by limiting them to generalphrases, such summaries would fail to be very informative. Alternatively, onecan try to avoid hallucinations by verifying that any specific entities in thesummary appear in the original text in a similar context. This is the approachtaken by our system, Herman. The system learns to recognize and verify quantityentities (dates, numbers, sums of money, etc.) in a beam-worth of abstractivesummaries produced by state-of-the-art models, in order to up-rank thosesummaries whose quantity terms are supported by the original text. Experimentalresults demonstrate that the ROUGE scores of such up-ranked summaries have ahigher Precision than summaries that have not been up-ranked, without acomparable loss in Recall, resulting in higher F$_1$. Preliminary humanevaluation of up-ranked vs. original summaries shows people's preference forthe former.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)