DavidNemeskey / cc_corpus

Tools for compiling corpora from Common Crawl
GNU Lesser General Public License v3.0
12 stars 1 forks source link

New indexing #44

Closed DavidNemeskey closed 1 year ago

DavidNemeskey commented 1 year ago

... that accesses data via S3 (through a HTTP proxy), not the index server.