DavidNemeskey / cc_corpus

Tools for compiling corpora from Common Crawl
GNU Lesser General Public License v3.0
12 stars 1 forks source link

Fixes #16

Closed DavidNemeskey closed 3 years ago

DavidNemeskey commented 3 years ago

A few fixes:

Note that the latter two fixes only concern the emtsv output and should be run manually by invoking fix_corpus.py.