facebookresearch / cc_net

Tools to download and cleanup Common Crawl data
MIT License
964 stars 139 forks source link

The questions about the stats json configuration file #42

Open QHPHBias opened 1 year ago

QHPHBias commented 1 year ago

I want to crawl the latest 2023-06 snapshot data, how do I configure my stats.json? I notice that the json file has two tags, size and checksum. How do I define the values of these two tags, or how do I get them?