Analysis of generated/reference sequences

dennlinger / summaries

A toolkit for summarization analysis and aspect-based summarizers

MIT License

11 stars 0 forks source link

Current draft proposal of tools for the analysis of sequences. There are several ideas that are incorporated here:

[x] Extracting the longest common subsequence of a target sequences with the reference. This measures the level of extractivity beyond simple exact matches of the entire summary. However, this is rather computationally inefficient.
[x] Extracting the fraction of Rouge-2 overlap (probably the recall?), which is an approximation of the previously mentioned LCS overlap value.
[x] Exact matches, which is used by others to analyze extractivity problems, for example this blog post.
[x] Along the lines of Jiahui's project, measure the number of "invalid" samples, either through extractive copying, or otherwise faulty samples (either empty, or longer than the input text).
[x] Finding repeating n-grams within the generated output text.

Also includes minor bug fix for existing aligners.

Unfortunately also includes some preliminary experiments on MLSUM in this PR, which should technically not be in here, but a separate PR. Importantly, though, these match exactly the results obtained by Philip May, whose post is linked above.

I also realized that a single function analyzing samples might be counterproductive, since it is not clear whether/how many samples have several issues (i.e., empty samples will also turn up as having a "longer/equal summary than reference text length". Instead, these remain as separate functions for now that might be tied together later.

Also realized that there are some inconsistencies wrt the lemmatization (see issue #33), which is not fully propagated to "lower-level" functions yet.

dennlinger / summaries

Analysis of generated/reference sequences #31