Closed dennlinger closed 1 year ago
Also blocker by #52, given that we want to preserve previous splits.
Opened issue for LegalSum: https://github.com/sebimo/LegalSum/issues/1
Currently awaiting response from the authors, but it seems we can easily use this, since the downloaded files already look promisingly clean. Unclear which columns to use, and also some discrepancy in the provided file count.
Also #52 done now, so cleaning is generally possible with the preliminary filters.
On top, though, it would be great to compare the stats before and after filtering. Need to include Jiahui's code first for this, see #50.
Incorporated in #63 .
Re-opening #1 with new task focus, which is first to filter all of the existing German datasets. Also depends on "finishing" a working version of
Cleaner
.List of obtained datasets:
Editing this to reflect each of those points will get a separate issue.