Currently, some of the functions of Analyzer work on the level of a singular sample, whereas other functions work on an entire input dataset at once.
It would make sense to restructure the Analyzer to work with a more streamlined interface, or otherwise separate concerns a bit more stringently (possibly dedicating a separate class Deduplicator would be helpful.
Also, the current tool is entirely focused on single-document summarization. Deduplication, for example, might not be necessary (or at least to a lesser degree) if we intend to keep duplications alive in an MDS setting, where the exact content might still vary.
Currently, some of the functions of
Analyzer
work on the level of a singular sample, whereas other functions work on an entire input dataset at once.It would make sense to restructure the Analyzer to work with a more streamlined interface, or otherwise separate concerns a bit more stringently (possibly dedicating a separate class
Deduplicator
would be helpful.Also, the current tool is entirely focused on single-document summarization. Deduplication, for example, might not be necessary (or at least to a lesser degree) if we intend to keep duplications alive in an MDS setting, where the exact content might still vary.