facebookresearch / cc_net

Tools to download and cleanup Common Crawl data
MIT License
932 stars 138 forks source link

Whether CC_Net provides an existing monolingual corpus #52

Open yangyang0202 opened 11 months ago

yangyang0202 commented 11 months ago

https://data.statmt.org/cc-100/ This link only provides the corpus extracted in 2018. Is there any corpus from 2018 onwards?