DavidNemeskey / cc_corpus

Tools for compiling corpora from Common Crawl
GNU Lesser General Public License v3.0
12 stars 1 forks source link

Add language filter to filter_warc.py #26

Open DavidNemeskey opened 1 year ago