dennlinger / summaries

A toolkit for summarization analysis and aspect-based summarizers
MIT License
11 stars 0 forks source link

Updating consistency in `Analyzer`, and adding `Cleaner` #41

Closed dennlinger closed 1 year ago

dennlinger commented 1 year ago

As outlined in #37, this moves some functions towards a more sample-centric functionality. Currently only the duplication detection in the Analyzer is now working on the full dataset, which is caused by a different filtering approach in Cleaner.
Speaking of, Cleaner is another utility that uses the primary functions from Analyzer to actually filter a dataset (or rather, several splits of the same dataset). By default, it will apply some light filtering on lengths (e.g., removing samples with longer summaries than references), and also look for duplicates, although in a slightly different fashion than Analyzer, since it will actually have to deal with the correct removal as well.