dennlinger / summaries

A toolkit for summarization analysis and aspect-based summarizers
MIT License
11 stars 0 forks source link

Extend duplication detection to cross-contaminated samples #39

Open dennlinger opened 1 year ago

dennlinger commented 1 year ago

Notably, none of the duplication detection functions in Analyzer assume contamination across references and summaries, i.e., instances where the reference text of one data instance would be the summary of another one.

It would be interesting to see whether this is actually a problem (i.e., happens in the wild), but also should not be too difficult to implement regardless. The only downside is that this can be quite costly in terms of computation, especially when using comparison methods other than exact.