Closed AmosHua closed 5 years ago
Hello,
It looks like you forgot to run the commands launched by data_creation/slurm_scripts/eli_merge_docs_launcher.sh
Specifically, the file you're looking for is created by running the following AFTER you have processed all 100 slices of the CommonCrawl:
python merge_support_docs.py explainlikeimfive 0
Note however that the data creation code was written with academic and industry clusters in mind: the support document collection part of the process needs to download and filter a whole CommonCrawl dump, which would take several weeks on a single machine.
Hi, when I was trying to build this dataset on my PC, I haven't use the slurm. However, when I execute the 'merge_support_docs.py' command, it returned an error of ' FileNotFoundError: [Errno 2] No such file or directory: 'processed_data/collected_docs/explainlikeimfive/0.json'. Here is my collected docs document