facebookresearch / ELI5

Scripts and links to recreate the ELI5 dataset.
Other
318 stars 40 forks source link

Building on PC #6

Closed AmosHua closed 5 years ago

AmosHua commented 5 years ago

Hi, when I was trying to build this dataset on my PC, I haven't use the slurm. However, when I execute the 'merge_support_docs.py' command, it returned an error of ' FileNotFoundError: [Errno 2] No such file or directory: 'processed_data/collected_docs/explainlikeimfive/0.json'. Here is my collected docs document image

yjernite commented 5 years ago

Hello,

It looks like you forgot to run the commands launched by data_creation/slurm_scripts/eli_merge_docs_launcher.sh

Specifically, the file you're looking for is created by running the following AFTER you have processed all 100 slices of the CommonCrawl:

python merge_support_docs.py explainlikeimfive 0

Note however that the data creation code was written with academic and industry clusters in mind: the support document collection part of the process needs to download and filter a whole CommonCrawl dump, which would take several weeks on a single machine.