DavidNemeskey / cc_corpus

Tools for compiling corpora from Common Crawl
GNU Lesser General Public License v3.0
12 stars 1 forks source link

Bib support #50

Closed DavidNemeskey closed 1 year ago

DavidNemeskey commented 1 year ago

Bib support added. Even if the entries will most likely be 99% filtered (most abstracts are in English), at least we are not introducing random punctuation sequences...