dennlinger / summaries

A toolkit for summarization analysis and aspect-based summarizers
MIT License
11 stars 0 forks source link

Detect used alphabet in dataset #58

Closed dennlinger closed 1 year ago

dennlinger commented 1 year ago

This is useful for finding out about special characters (e.g., infamous \xa0) in a dataset. Potentially in combination with Counter, it would be a great analysis tool to see which characters appear how frequently.

Only problem: This would be part of Analyzer, but should operate at the level of a dataset, which makes it again inconsistent (see #37).

dennlinger commented 1 year ago

This can be relatively easily solved by "adding" two Counter objects across elements. Slows down the analysis speed quite a bit (~factor three), but otherwise adds minimal overhead in code.

dennlinger commented 1 year ago

Resolved through #64.