Allow users to pass self-defined functions to check samples for the particular purpose of specifying what exactly was filtered.
Logic to specify which reason for filtering was applied was slightly changed.
Aside from this, some initial experiments have been added so far, essentially applying Cleaner to the German datasets.
Turns out that the MassiveSumm German subset (kindly provided by the author, Daniel Varab) still contains a high number of duplicates & short samples, which I am currently in the process of filtering out.
These experiments in turn gave me some ideas on how to improve the filtering setup (in particular, #46), which I might tackle in the future.
More customization options for
Cleaner
:Aside from this, some initial experiments have been added so far, essentially applying
Cleaner
to the German datasets. Turns out that theMassiveSumm
German subset (kindly provided by the author, Daniel Varab) still contains a high number of duplicates & short samples, which I am currently in the process of filtering out.These experiments in turn gave me some ideas on how to improve the filtering setup (in particular, #46), which I might tackle in the future.