facebookresearch / ELI5

Scripts and links to recreate the ELI5 dataset.
Other
316 stars 40 forks source link

403 Forbidden when downloading common crawl data #34

Open velocityCavalry opened 2 years ago

velocityCavalry commented 2 years ago

Bug description Hi, I was trying to download the supporting documents by running wget https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2018-34/wet.paths.gz, but it keeps on telling me

Resolving commoncrawl.s3.amazonaws.com (commoncrawl.s3.amazonaws.com)... 52.217.87.76
Connecting to commoncrawl.s3.amazonaws.com (commoncrawl.s3.amazonaws.com)|52.217.87.76|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2022-07-11 15:01:56 ERROR 403: Forbidden.

I've tried on different machines and none of them works.

Expected behavior Succesfull downloads.

Thank you in advance!

SunYuanKang commented 1 year ago

Hi, I have the same problem as you, could you please tell me how to deal with it? Thanks a lot.

llllooong commented 1 year ago

same problem

yidong72 commented 1 year ago

Try this link? https://data.commoncrawl.org/crawl-data/CC-MAIN-2018-34/wet.paths.gz