facebookresearch / cc_net

Tools to download and cleanup Common Crawl data
MIT License
972 stars 142 forks source link

Whether CC_Net provides an existing monolingual corpus #52

Open yangyang0202 opened 1 year ago

yangyang0202 commented 1 year ago

https://data.statmt.org/cc-100/ This link only provides the corpus extracted in 2018. Is there any corpus from 2018 onwards?