Building on PC - Githubissues

facebookresearch / ELI5

Scripts and links to recreate the ELI5 dataset.

Other

318 stars 40 forks source link

Hello,

It looks like you forgot to run the commands launched by data_creation/slurm_scripts/eli_merge_docs_launcher.sh

Specifically, the file you're looking for is created by running the following AFTER you have processed all 100 slices of the CommonCrawl:

python merge_support_docs.py explainlikeimfive 0

Note however that the data creation code was written with academic and industry clusters in mind: the support document collection part of the process needs to download and filter a whole CommonCrawl dump, which would take several weeks on a single machine.

facebookresearch / ELI5

Building on PC #6