dennlinger / summaries

A toolkit for summarization analysis and aspect-based summarizers
MIT License
11 stars 0 forks source link

Adding differentiation in how to count duplicates #55

Closed dennlinger closed 1 year ago

dennlinger commented 1 year ago

As discussed with Svea Klaus from the EUR-LexSum dataset, it will be helpful to know which kind of duplication may occur.

Now introduces four types:

  1. exact_duplicate, where the exact combination of (reference, summary) has been encountered before.
  2. both_duplicate, where both the reference and summary have been encountered before, but separately and not together.
  3. reference_duplicate, where only the reference has been encountered before.
  4. summary_duplicate, where only the summary has been encountered before.