Adding test-preserving iteration order for deduplication

dennlinger / summaries

A toolkit for summarization analysis and aspect-based summarizers

MIT License

11 stars 0 forks source link

Adding test-preserving iteration order for deduplication #53

Closed dennlinger closed 1 year ago

dennlinger commented 1 year ago

By reversing the iteration order over splits, we can retain more of the test and validation set, and primarily remove samples from the train set, which might be desirable to keep the comparability with existing splits and evaluation setups. Addresses #52.

Also adds the option to suppress the output of the breakdown of removed samples.