dennlinger / summaries

A toolkit for summarization analysis and aspect-based summarizers
MIT License
11 stars 0 forks source link

Analysis of generated/reference sequences #31

Closed dennlinger closed 2 years ago

dennlinger commented 2 years ago

Current draft proposal of tools for the analysis of sequences. There are several ideas that are incorporated here:

Also includes minor bug fix for existing aligners.

dennlinger commented 2 years ago

Unfortunately also includes some preliminary experiments on MLSUM in this PR, which should technically not be in here, but a separate PR. Importantly, though, these match exactly the results obtained by Philip May, whose post is linked above.

I also realized that a single function analyzing samples might be counterproductive, since it is not clear whether/how many samples have several issues (i.e., empty samples will also turn up as having a "longer/equal summary than reference text length". Instead, these remain as separate functions for now that might be tied together later.

Also realized that there are some inconsistencies wrt the lemmatization (see issue #33), which is not fully propagated to "lower-level" functions yet.