DavidNemeskey / cc_corpus

Tools for compiling corpora from Common Crawl
GNU Lesser General Public License v3.0
12 stars 1 forks source link

S3 download of index #65

Open acheronw opened 4 months ago

acheronw commented 4 months ago

A version that uses boto3 library to download the index in step 01 get indexfiles.