dennlinger / summaries

A toolkit for summarization analysis and aspect-based summarizers
MIT License
11 stars 0 forks source link

Filter German datasets #49

Closed dennlinger closed 1 year ago

dennlinger commented 1 year ago

Re-opening #1 with new task focus, which is first to filter all of the existing German datasets. Also depends on "finishing" a working version of Cleaner.

List of obtained datasets:

Editing this to reflect each of those points will get a separate issue.

dennlinger commented 1 year ago

Also blocker by #52, given that we want to preserve previous splits.

dennlinger commented 1 year ago

Opened issue for LegalSum: https://github.com/sebimo/LegalSum/issues/1
Currently awaiting response from the authors, but it seems we can easily use this, since the downloaded files already look promisingly clean. Unclear which columns to use, and also some discrepancy in the provided file count.

dennlinger commented 1 year ago

Also #52 done now, so cleaning is generally possible with the preliminary filters.

On top, though, it would be great to compare the stats before and after filtering. Need to include Jiahui's code first for this, see #50.

dennlinger commented 1 year ago

Incorporated in #63 .