facebookresearch / dpr-scale

Scalable training for dense retrieval models.
262 stars 25 forks source link

Reddit data download link broken #11

Closed jordane95 closed 1 year ago

jordane95 commented 1 year ago

Hi, I'm trying to download the 200M reddit data. But it seems that the url is broken

$wget https://dl.fbaipublicfiles.com/dpr_scale/reddit/train.200M.jsonl
--2023-04-18 15:05:43--  https://dl.fbaipublicfiles.com/dpr_scale/reddit/train.200M.jsonl
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 13.32.50.61, 13.32.50.10, 13.32.50.72, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|13.32.50.61|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-04-18 15:05:44 ERROR 403: Forbidden.
ccsasuke commented 1 year ago

Hi @jordane95, upon checking, unfortunately this data has been purged. I can try to recover it, but this is difficult and often impossible. I'll update here if have any luck retrieving this file.

jordane95 commented 1 year ago

Thanks for your effort! Btw, do you still have the reddit data preprocessing script? That's another way for reproduction

ccsasuke commented 1 year ago

@jordane95 The link has been fixed, sorry for the inconvenience.